Visual Abstract
Abstract
Comparative analyses suggest that the MHC was derived from a prevertebrate “primordial immune complex” (PIC). PIC duplicated twice in the well-studied two rounds of genome-wide duplications (2R) early in vertebrate evolution, generating four MHC paralogous regions (predominantly on human chromosomes [chr] 1, 6, 9, 19). Examining chiefly the amphibian Xenopus laevis, but also other vertebrates, we identified their MHC paralogues and mapped MHC class I, AgR, and “framework” genes. Most class I genes mapped to MHC paralogues, but a cluster of Xenopus MHC class Ib genes (xnc), which previously was mapped outside of the MHC paralogues, was surrounded by genes syntenic to mammalian CD1 genes, a region previously proposed as an MHC paralogue on human chr 1. Thus, this gene block is instead the result of a translocation that we call the translocated part of the MHC paralogous region (MHCtrans). Analyses of Xenopus class I genes, as well as MHCtrans, suggest that class I arose at 1R on the chr 6/19 ancestor. Of great interest are nonrearranging AgR-like genes mapping to three MHC paralogues; thus, PIC clearly contained several AgR precursor loci, predating MHC class I/II. However, all rearranging AgR genes were found on paralogues derived from the chr 19 precursor, suggesting that invasion of a variable (V) exon by the RAG transposon occurred after 2R. We propose models for the evolutionary history of MHC/TCR/Ig and speculate on the dichotomy between the jawless (lamprey and hagfish) and jawed vertebrate adaptive immune systems, as we found genes related to variable lymphocyte receptors also map to MHC paralogues.
This article is featured in In This Issue, p.1681
Introduction
The “2R hypothesis” has proposed that the early vertebrate genome experienced two rounds of genome-wide duplications (1). Indeed, there are four paralogous clusters of genes in the genomes of all jawed vertebrates, first studied in humans for homeobox and MHC genes (2, 3). When genes or genetic regions are duplicated, some loci preserve their original function, whereas others are modified (neofunctionalization or subfunctionalization) or may experience differential silencing. Other types of genome modifications may occur, such as translocation of block regions, at times blurring the origins of a particular genetic region.
As mentioned, the MHC was one of the original gene clusters noted for its paralogous regions (or “ohnologues”), found on human chromosomes (chr) 6 (MHC), 1, 9, and 19 (MHC paralogues [MHCpara]) (3, 4). Further analysis using the insulin/relaxin and neurotrophin/neurotrophin receptor family genes revealed that there are additional regions containing paralogous genes in a similar order (5–7), and it has been suggested that the precursors of these regions and MHCpara were syntenic during the preduplication era, but some were translocated over evolutionary time. These detached regions include sections of human chr 12, 14, and 15, and are generally shorter than the original regions; we refer to these detached regions as “minor MHCpara,” and the original four regions as “major MHCpara.”
The MHC harbors many genes involved in adaptive and innate immunity (6, 8). Central to the adaptive immune system, the Ag-presenting MHC class I and class II molecules work in concert with Ag-processing (immunoproteasomes), peptide-transporting (TAP), peptide-editing (DM, TAPBP), and other molecules, to present antigenic peptides recognized by TCR. Precursors of these genes were likely derived from the so-called primordial immune complex (PIC), predating the genome-wide duplications in early vertebrates (9). Indeed, analysis of several invertebrate deuterostome genomes [e.g., amphioxus (Branchiostoma lanceolatum) (10), and a placozoan (Trichoplax adhaerens) (11)] revealed conserved synteny of proteasome and “framework” genes (i.e., nonimmune genes in MHC). To date, and unfortunately, no candidate class I/II genes have been detected in species derived from ancestors predating the jawed vertebrates, and thus most genes strictly involved in adaptive immunity (based on MHC, Ig, TCR) seem to have appeared “suddenly” in a gnathostome ancestor. Because both MHC and MHCpara are derived from a preduplicated precursor region in a common vertebrate ancestor (3, 6, 9), analysis of these regions from different extant vertebrates provides insight into the evolutionary history of the MHC and its precursor.
Previous work on the paralogous regions has focused only on mammals. In this study, we took advantage of the published work in humans and focused on the genome of the amphibian Xenopus. Previous studies showed that the Xenopus genome is relatively stable and preserves some primordial features that were lost in other vertebrates (12), thus serving as a complementary model system to study genome evolution. We used the true diploid Xenopus tropicalis (13) and especially the tetraploid Xenopus laevis (14), in which the genomes have been recently sequenced and analyzed. In combination with comparative genomic analyses, we obtained evidence for the timing of emergence of MHC class I/II and AgR genes. We further propose a model for the evolution of the human chr 1q21.1–23.3 region, including the CD1 genes, and reflect on the dichotomy between the jawed and jawless vertebrate adaptive immune systems.
Materials and Methods
Data mining
We examined gene models (i.e., software-generated conceptual translation) in the scaffolds and genome assembly with subsequent manual validation/annotation. Additionally, we performed tblastn to find genes that were overlooked by the gene-finder software at the web portal. Chromosomal location of Xenopus genes was obtained based on the mapped BAC clones using fluorescence in situ hybridization (FISH) methods described elsewhere (15). All information is publicly available through Xenbase (http://xenbase.org) (X. laevis v7.1 and 9.1, X. tropicalis v8 and 9) and the National Center for Biotechnology Information (NCBI) (http://ncbi.nlm.nih.gov). We found inconsistent assemblies among different X. tropicalis versions as well as between X. laevis and X. tropicalis. More extensive mapping has been done with X. laevis chromosomes, and thus the X. laevis genome was largely used for this study. Genomic data from vertebrates other than Xenopus were obtained from various databases in GenBank at NCBI. Gene models from the X. laevis genome are found at NCBI: VJC11310 (ACB47447); VJC1258 (OCT67647); VJC1406 (OCT69143-7); class Ib112 (XP_018111305); class Ib145 (OCT68671); class Ib16004 (XP_018109328). Note that these gene model-based sequences are predicted and thus may not always reflect the RNA sequence. We found that most Ig superfamily (IgSF) domains encoded within a single exon are reliable with occasional inaccurate exon-intron boundaries.
Statistical validation of conserved synteny
Synteny probability calculation was performed using the method described by Danchin et al. (16); we calculated the binomial probability that the Xenopus regions of interest are in synteny with their human corollaries or the probability that the genes were organized by chance. This probability is calculated using a binomial probability as:
where x is the number of homologous genes of human found in the Xenopus regions and p is the proportion of genes in the hypothesized human region (i.e., number of genes divided by 20,199 total protein-coding genes in the human reference GRC38 dataset at NCBI). This gives the probability of our selected Xenopus regions have the same compliment of genes as humans by chance. To keep consistency of gene criteria, we obtained protein-coding genes from Xenopus_laevis_v2 dataset at NCBI.
For all reported statistics, we included both hypothetical and duplicated genes in Xenopus as a conservative probability estimate of synteny, but with or without these gene subsets, all probabilities provide the same interpretation, if not decreasing the probability of synteny by chance.
Results
Two divergent subgenomes in the tetraploid X. laevis
X. laevis is an allotetraploid (4n) species, generated by hybridization of two divergent ancestral diploid (2n) Xenopus species (subgenomes long [L] and short [S]), and thus its genome contains sets of paired, or homeologous, chromosomes (i.e., 1L ∼ 9L and 1S ∼ 9S; n = 18). These two subgenomes have been independently maintained, with no detectable intergenome recombination (14). Genome-wide analysis further revealed that synteny is generally well conserved between L and S chromosomes, but gene loss, when it occurs [often the case for many adaptive immune genes (17)], is much more frequent on S chromosomes (14). Gene content of the L chromosomes is most similar to the genome of the true diploid X. tropicalis. Although most housekeeping genes are present on both chromosomes, most class I (except a few class I–like genes), AgR, and AgR-like genes discussed in this report were diploidized and thus found only on the L chromosomes, and therefore we focused our analyses on the L chromosomes.
Xenopus MHC and identification of major and minor MHCpara regions
The Xenopus MHC was previously mapped by FISH to chr 8 (18) and now is precisely mapped to 8Lq21. To identify Xenopus MHCpara, we used sets of paralogous hallmark genes that were originally used to identify the human MHCpara (huMHCpara) (3) (e.g., notch1, 2, 3, 4; pbx1, 2, 3, 4; rxra, b, g; and complement c3, 4, 5, a2m). Other conserved paralogues such as brd1, 2, 3, 4 were not all detected in the current Xenopus assemblies and thus were excluded from analyses. Like in humans, we found the same four sets of clustered paralogous hallmark genes on Xenopus chromosomes: 8Lq21 (MHC), 4Lq24-25, 8Lp11-12, and 3Lq33-34, as well as orthologs of the human minor MHCpara on 1Lq and 7Lp23-24 (Fig. 1, Table I; hallmark genes in red).
MHC and MHCpara . | |||||||
---|---|---|---|---|---|---|---|
MHCpara . | Genes . | Human chr. . | X. laevis chr.a . | Scaffold (v7.1) . | Position (v7.1)b . | FISH BAC . | Position (v9.1)b . |
MHC-6 | TAPBP | 6p21.3 | 8Lq14-21 | 50694 | 6,954,053..6,969,886 | 108L10 | 50,739,635..50,755,022 |
RXRB | 6p21.3 | 8Lq14-21 | 50694 | 7,102,020..7,117,938 | 106L10 | 50,887,807..50,903,520 | |
PSMB8 | 6p21.3 | 8Lq21 | 75398 | 274,622..283,843 | 290K18 | 78,508,537..78,523,720 | |
PSMB9 | 6p21.3 | 8Sq21 | 12933 | 4,797,175..4,812,351 | 044A14 | 78,508,537..78,523,720 | |
PBX2 | 6p21.3 | 8Lq21 | 75398 | 378,082..396,079 | 114D22 | 51,636,291..51,653,761 | |
NOTCH4 | 6p21.3 | 8Lq21 | 75398 | 337,685..353,721 | 114D22 | 59,569,525..51,611,344 | |
C4 | 6p21.33 | 8Lq21 | 75398 | 475,934..520,806 | 114D22 | 51,733,637..51,778,496 | |
PSMB10 | 16q22.1 | 8Lq21 | 75398 | 524,686..539,832 | 114D22 | 51,782,368..51,796,858 | |
MHCpara-1 | NOTCH2 | 1p13-p11 | 4Lq25 | 78978 | 84,826..126,901 | 055J23 | 110,037,275..110,044,215 |
PBX1 | 1q23 | 4Lq24 | 47606 | 5,480,539..5,556,446 | 036M06 | 99,325,939..99,347,128 | |
RXRG | 1q22-q23 | 4Lq25 | 78978 | 1,407,934..1,480,769 | 055J23 | 111,399,108..111,408,361 | |
MHCpara-9 | NOTCH1 | 9q34.3 | 8Lp12 | 37448 | 2,529,375..2,559,128 | 030B08 | 4,177,800..4,228,940 |
RXRA | 9q34.3 | 8Lp | 255149 | 96,949..211,615 | NA | 5,266,355..5,268,209 | |
PBX3 | 9q33.3 | 8Lp11 | 403228 | 523,205..572,639 | 020L15 | 11,095,878..11,257,315 | |
PSMB7 | 9q33.3 | 8Lp11-12 | 3586 | 2,248,619..2,282,130 | 227M14 | 9,754,478..9,780,669 | |
C5 | 9q33.2 | 8Lp | 86205 | 1,227,102..1,317,602 | NA | 5,816,244..5,865,712 | |
MHCpara-19 | NOTCH3 | 19p13.2 | 3Lq33-34 | 171831 | 677,258..734,233 | 079J11 | 125,881,103..125,938,078 |
C3 | 19p13.3 | 3Lq34-35 | 175714 | 455,613..739,326 | 322O09 | 134,274,206..134,300,156 | |
PSMB6 | 17p13.2 | 3Lq35 | 16004 | 50,127..57,691 | 017J04 | 139,511,604..139,519,183 | |
PBX4 | 19p13.11 | NA | NA | NA | NA | NA | |
MHCpara-14 (minor) | IgLσ | NA | 1Lq12 | 39437 | 417,923..418,230 | 031N23 | 98,280,301..98,295,182 |
TRA | 14q11.2 | 1Lq15 | 29869 | 458,946..459,559 | 039F04 | 140,207,982..140,211,379 | |
TRD | 14q11.2 | 1Lq15 | 272406 | 116,704..184,681 | 130J21 | 140,946,814..140,951,210 | |
IgHMC | 14q32-33 | 1Lq14-15 | 13576 | 6,811,972..7,160,435 | 312E22 | 139,040,662..139,059,333 | |
PSMB5 | 14q11.2 | 1Lq14 | 13576 | 6,389,514..6,394,129 | 244A12 | 138,627,523..138,632,499 | |
IgLλ | 22q11.22 | 1Lq21 | 162663 | 1..140,765 | 159H19 | 153,417,276..153,418,351 | |
MHCpara-12 (minor) | TAPBPL | 12p13.31 | 7Lp23-24 | 79772 | 4,980,784..7,959,403 | 225A12 | 7,950,550..7,960,609 |
LAG3 | 12p13.31 | 7Lp23-24 | 79772 | 5,304,485..5,317,904 | 225A12 | 7,593,489..7,606,908 | |
CD4 | 12p13.31 | 7Lp23-24 | 79772 | 5,359,817..5,371,805 | 225A12 | 7,539,588..7,551,576 | |
A2M | 12p13.31 | 7Lp24 | 131666 | 1,275,208..1,307,453 | 307G18 | 5,334,645..5,366,890 | |
CLEC2B | 12p13.31 | 7Lp24 | 131666 | 693,899..709,661 | 307G18 | 5,932,418..5,948,215 |
MHC and MHCpara . | |||||||
---|---|---|---|---|---|---|---|
MHCpara . | Genes . | Human chr. . | X. laevis chr.a . | Scaffold (v7.1) . | Position (v7.1)b . | FISH BAC . | Position (v9.1)b . |
MHC-6 | TAPBP | 6p21.3 | 8Lq14-21 | 50694 | 6,954,053..6,969,886 | 108L10 | 50,739,635..50,755,022 |
RXRB | 6p21.3 | 8Lq14-21 | 50694 | 7,102,020..7,117,938 | 106L10 | 50,887,807..50,903,520 | |
PSMB8 | 6p21.3 | 8Lq21 | 75398 | 274,622..283,843 | 290K18 | 78,508,537..78,523,720 | |
PSMB9 | 6p21.3 | 8Sq21 | 12933 | 4,797,175..4,812,351 | 044A14 | 78,508,537..78,523,720 | |
PBX2 | 6p21.3 | 8Lq21 | 75398 | 378,082..396,079 | 114D22 | 51,636,291..51,653,761 | |
NOTCH4 | 6p21.3 | 8Lq21 | 75398 | 337,685..353,721 | 114D22 | 59,569,525..51,611,344 | |
C4 | 6p21.33 | 8Lq21 | 75398 | 475,934..520,806 | 114D22 | 51,733,637..51,778,496 | |
PSMB10 | 16q22.1 | 8Lq21 | 75398 | 524,686..539,832 | 114D22 | 51,782,368..51,796,858 | |
MHCpara-1 | NOTCH2 | 1p13-p11 | 4Lq25 | 78978 | 84,826..126,901 | 055J23 | 110,037,275..110,044,215 |
PBX1 | 1q23 | 4Lq24 | 47606 | 5,480,539..5,556,446 | 036M06 | 99,325,939..99,347,128 | |
RXRG | 1q22-q23 | 4Lq25 | 78978 | 1,407,934..1,480,769 | 055J23 | 111,399,108..111,408,361 | |
MHCpara-9 | NOTCH1 | 9q34.3 | 8Lp12 | 37448 | 2,529,375..2,559,128 | 030B08 | 4,177,800..4,228,940 |
RXRA | 9q34.3 | 8Lp | 255149 | 96,949..211,615 | NA | 5,266,355..5,268,209 | |
PBX3 | 9q33.3 | 8Lp11 | 403228 | 523,205..572,639 | 020L15 | 11,095,878..11,257,315 | |
PSMB7 | 9q33.3 | 8Lp11-12 | 3586 | 2,248,619..2,282,130 | 227M14 | 9,754,478..9,780,669 | |
C5 | 9q33.2 | 8Lp | 86205 | 1,227,102..1,317,602 | NA | 5,816,244..5,865,712 | |
MHCpara-19 | NOTCH3 | 19p13.2 | 3Lq33-34 | 171831 | 677,258..734,233 | 079J11 | 125,881,103..125,938,078 |
C3 | 19p13.3 | 3Lq34-35 | 175714 | 455,613..739,326 | 322O09 | 134,274,206..134,300,156 | |
PSMB6 | 17p13.2 | 3Lq35 | 16004 | 50,127..57,691 | 017J04 | 139,511,604..139,519,183 | |
PBX4 | 19p13.11 | NA | NA | NA | NA | NA | |
MHCpara-14 (minor) | IgLσ | NA | 1Lq12 | 39437 | 417,923..418,230 | 031N23 | 98,280,301..98,295,182 |
TRA | 14q11.2 | 1Lq15 | 29869 | 458,946..459,559 | 039F04 | 140,207,982..140,211,379 | |
TRD | 14q11.2 | 1Lq15 | 272406 | 116,704..184,681 | 130J21 | 140,946,814..140,951,210 | |
IgHMC | 14q32-33 | 1Lq14-15 | 13576 | 6,811,972..7,160,435 | 312E22 | 139,040,662..139,059,333 | |
PSMB5 | 14q11.2 | 1Lq14 | 13576 | 6,389,514..6,394,129 | 244A12 | 138,627,523..138,632,499 | |
IgLλ | 22q11.22 | 1Lq21 | 162663 | 1..140,765 | 159H19 | 153,417,276..153,418,351 | |
MHCpara-12 (minor) | TAPBPL | 12p13.31 | 7Lp23-24 | 79772 | 4,980,784..7,959,403 | 225A12 | 7,950,550..7,960,609 |
LAG3 | 12p13.31 | 7Lp23-24 | 79772 | 5,304,485..5,317,904 | 225A12 | 7,593,489..7,606,908 | |
CD4 | 12p13.31 | 7Lp23-24 | 79772 | 5,359,817..5,371,805 | 225A12 | 7,539,588..7,551,576 | |
A2M | 12p13.31 | 7Lp24 | 131666 | 1,275,208..1,307,453 | 307G18 | 5,334,645..5,366,890 | |
CLEC2B | 12p13.31 | 7Lp24 | 131666 | 693,899..709,661 | 307G18 | 5,932,418..5,948,215 |
Class Ia/Ib and AgR genes | |||||||
Gene | Human chr | X. laevis chr.a | Scaffold (v.7.1) | Position (v7.1)b | FISH BAC | Position (v9.1)b | Domains |
MHC class I and class I–like | |||||||
112 | 1Lq12 | 72621 | 122,476..126,293 | 085N05 | 102,130,692..102,139,541 | a1,2,3; a1,2; a2 | |
145 | 8Lq25 | 265107 | 1,565,727..1,581,290 | 012C13 | 87,117,299..87,129,697 | a1,2,3 | |
Class Ia | 6p21.3 | 8Lq21 | 75396 | 164,448..242,219 | 290K18 | 51,482,854..51,498,908 | a1,2,3 |
XNC | 8Lq31-32 | 26819 | 3,427,830..3,826,756 | 156D07 | 110,198,845..110,862,792 | a1,2,3 | |
16004 | 3Lq35 | 16004 | 123,032..130,911 | 017J04 | 139,582,763..139,592,397 | a1,2,3 | |
CD1 | 1q22-23 | a1,2,3 | |||||
MR1 | 1q25.3 | a1,2,3 | |||||
FCGRT | 19q13.33 | a1,2,3 | |||||
PROCR | 20q11.2 | a1,2 | |||||
ZAG | 7q22.1 | a1,2,3 | |||||
ULBP RAET | 6q25 | a1,2,3 | |||||
AgR-like | |||||||
1310 | 8Lp12 | 127590 | 359,968..365,248 | 209G21 | 1,072,952..1,075,438 | VC | |
258 | 8Lq14-21 | 50694 | 22,116..25,167 | 106L10 | 43,808,224..43,815,003 | VC | |
406 | Lost? (1q22) | 8Lq31-32 | 115163 | Multigene family 221,846..1754,674 | 033B12 | 104,468,021..106,003,445 | VC |
PTCRA | 6p21.1 | C (loss of V?) | |||||
IgLκ | 2p12 | 1Lp32-34 | 109418 3467 | 2,725,506..2,725,994 177,260..183,220 | 213L05 146J08 | 9,199,747..9,212,091 | VC |
TCRβC | 7q34 | 7Lp23-24 | 230427 | 307,269..307,610 | 191H14 | 315,991...316,317 | VC |
TCRγC | 7P14 | 6Lp12-13 | 19169 | 498,099..551,608 | 045F01 | 62,074,212..62,074,523 | VC |
NKp30 homolog | |||||||
NKp30 | 6p21.3 | 4Lq25 | 35524 | Multigene family 2,568,835..2,569,428 | 166F02 | 118,024,408..118,452,984 | V |
XMIV | (6p21.3) | 8Lq21 | 75398 | Multigene family 1,530,600..1,631,611 | 154P18 | 52,754,412..52,854,193 | V |
Class Ia/Ib and AgR genes | |||||||
Gene | Human chr | X. laevis chr.a | Scaffold (v.7.1) | Position (v7.1)b | FISH BAC | Position (v9.1)b | Domains |
MHC class I and class I–like | |||||||
112 | 1Lq12 | 72621 | 122,476..126,293 | 085N05 | 102,130,692..102,139,541 | a1,2,3; a1,2; a2 | |
145 | 8Lq25 | 265107 | 1,565,727..1,581,290 | 012C13 | 87,117,299..87,129,697 | a1,2,3 | |
Class Ia | 6p21.3 | 8Lq21 | 75396 | 164,448..242,219 | 290K18 | 51,482,854..51,498,908 | a1,2,3 |
XNC | 8Lq31-32 | 26819 | 3,427,830..3,826,756 | 156D07 | 110,198,845..110,862,792 | a1,2,3 | |
16004 | 3Lq35 | 16004 | 123,032..130,911 | 017J04 | 139,582,763..139,592,397 | a1,2,3 | |
CD1 | 1q22-23 | a1,2,3 | |||||
MR1 | 1q25.3 | a1,2,3 | |||||
FCGRT | 19q13.33 | a1,2,3 | |||||
PROCR | 20q11.2 | a1,2 | |||||
ZAG | 7q22.1 | a1,2,3 | |||||
ULBP RAET | 6q25 | a1,2,3 | |||||
AgR-like | |||||||
1310 | 8Lp12 | 127590 | 359,968..365,248 | 209G21 | 1,072,952..1,075,438 | VC | |
258 | 8Lq14-21 | 50694 | 22,116..25,167 | 106L10 | 43,808,224..43,815,003 | VC | |
406 | Lost? (1q22) | 8Lq31-32 | 115163 | Multigene family 221,846..1754,674 | 033B12 | 104,468,021..106,003,445 | VC |
PTCRA | 6p21.1 | C (loss of V?) | |||||
IgLκ | 2p12 | 1Lp32-34 | 109418 3467 | 2,725,506..2,725,994 177,260..183,220 | 213L05 146J08 | 9,199,747..9,212,091 | VC |
TCRβC | 7q34 | 7Lp23-24 | 230427 | 307,269..307,610 | 191H14 | 315,991...316,317 | VC |
TCRγC | 7P14 | 6Lp12-13 | 19169 | 498,099..551,608 | 045F01 | 62,074,212..62,074,523 | VC |
NKp30 homolog | |||||||
NKp30 | 6p21.3 | 4Lq25 | 35524 | Multigene family 2,568,835..2,569,428 | 166F02 | 118,024,408..118,452,984 | V |
XMIV | (6p21.3) | 8Lq21 | 75398 | Multigene family 1,530,600..1,631,611 | 154P18 | 52,754,412..52,854,193 | V |
Mapping location based on v9.1.
Beginning..end of positions in the scaffolds.
Catalytically active proteasome β subunit genes are all encoded in Xenopus MHCpara
Proteasomes are the most abundant proteins in the cytoplasm and are required for cytosolic protein degradation and recycling pathways (19). Eukaryote proteasomes form a barrel-shaped catalytic tunnel with two identical outer rings composed of seven α-subunits and two identical inner rings composed of seven β-subunits. Only three β-subunits (PSMB5 [LMPX], PSMB6 [LMPY], PSMB7 [LMPZ]) are catalytically active. Upon immune stimulation, expression of three β-subunits, PSMB8 (LMP7), PSMB9 (LMP2), and PSMB10 (MECL1), are upregulated, replacing the constitutive subunits PSMB5, PSMB6, and PSMB7, respectively, to form the “immunoproteasome” that generates peptides preferable for class I binding (19). Because some prokaryotes possess only one type of β-subunit, it has been proposed that the genes encoding the catalytically active β-subunits, psmb5, 6, and 7, were generated by cis-duplication in an eukaryote ancestor, likely present in the proto-MHC (20, 21); indeed, β-subunit genes are found in linkage groups with MHC framework genes in preduplicated genomes in lower deuterostomes such as amphioxus (10, 21) and the placozoan T. adhaerens (11). All three immunoproteasome genes psmb8, 9, and 10 are encoded in the MHC of many ectothermic vertebrates (12, 22). In humans, only PSMB8 and PSMB9 are found in the MHC (chr 6), and PSMB10 on human chr 16 is the result of translocation out of the MHC. Likewise, the constitutive proteasome PSMB7 maps on huMHCpara-9 (i.e., huMHCpara chr 9) (light blue boxes in Fig. 1, Table I), but other PSMB genes were proposed to be translocated from their original location to other genomic regions outside MHCpara (20).
We found that Xenopus psmb6 maps to 3Lq35, in the vicinity of c3 and notch3, a region corresponding to huMHCpara-19, and we previously reported that Xenopus psmb10 maps in the MHC class III region (Fig. 1, Table I) (12), suggesting that the translocation of psmb6 and psmb10 occurred after the amphibian–mammal divergence. PSMB5 is found on human chr 14q11.2 in the vicinity of TCRA/D (14q11.2) and near the IgH chain (14q32.33) loci. This synteny is well conserved in Xenopus, with psmb5 on chr 1Lq14-15, near tcra/d (1Lq15), igh (1Lq14-15), and igl (λ and σ) (Fig. 1, Table I). As mentioned above, from the distribution of human insulin-relaxin genes (5), this region of human chr 14 is a genetic fragment originally linked to an MHC precursor, but translocated during vertebrate evolution, and is designated as a minor MHCpara (6, 7, 20) (Fig.1, Table I). In summary, unlike in humans, all Xenopus psmb genes encoding catalytic proteasome β subunits map to major or minor MHCpara.
Xenopus MHC class I genes map to the descendants of huMHCpara-6/19 precursor
In Xenopus, a single classical class I (class Ia) gene maps to the MHC (23), whereas a cluster of nonclassical class I (class Ib) genes (xnc) (24, 25) was previously mapped to the telomeric region of the MHC chromosome (18). Now we report three additional nonclassical class I genes in the Xenopus genome designated class Ib112, class Ib16004, and class Ib145, based on their original scaffold numbers in ver 4.1 (Table I). All three are single-copy genes on L chromosomes with typical class I domain structures, but the deduced amino acid sequences lack the evolutionarily conserved peptide-binding residues found in all classical class Ia molecules (Supplemental Fig. 1A); note that the class Ib112 is highly divergent from class Ia (see below). In addition, consistent with their designation as nonclassical class I genes, these three class I genes are monomorphic (data not shown), have a tissue-specific expression, and are expressed at much lower levels than class Ia (Supplemental Fig. 1E).
Whereas Xenopus MHC class Ia and the xnc cluster map to 8Lq21 and 8Lq31-32, respectively, the class Ib145 gene maps between the MHC and xnc (green box in Fig.1, Table I). Based on phylogenetic analyses, the class Ib145 gene is intermediate in similarity to the Xenopus class Ia and class Ib genes (Supplemental Fig. 2). Interestingly, the class Ib145 gene is surrounded by genes mapping to human chr 14q13.2 (Supplemental Table I), near huMHCpara-14. The class Ib16004 gene, most related to the xnc genes (Supplemental Fig. 2), maps very near (only four genes apart) to psmb6 on 3q33-34 in an MHCpara (Fig. 1, Table I). The human class Ib gene FCGRT encoding the p51 subunit of the neonatal IgG Fc receptor (FcRn) is found in a similar gene location as Xenopus class Ib1604, but we could not establish orthology between these two genes in phylogenetic analyses or synteny (Supplemental Fig. 2). However, the synteny of genes between class Ib16004 to psmb6 on human chr 17p13 is conserved (probability by chance: 3.33 × 10−16, Table II), further cementing the ancient class I–proteasome gene linkage. Most likely, this part of the MHCpara was translocated later in the vertebrate lineage.
Region . | No. of Genes in Hypothesized Human Regiona . | Pb . | Homologs in Xenopus Region . | Total in Xenopus Region . | Probability Human and Xenopus Share Genes by Chance . |
---|---|---|---|---|---|
VJC11310 | 327 | 1.62 × 10−2 | 172 | 220 | 3.89 × 10−15 |
Class Ib112 | 1158 | 5.73 × 10−2 | 139 | 279 | <1 × 10−16 |
MHC | 216 | 1.07 × 10−2 | 88 | 106 | <1 × 10−16 |
MHC without butyrophilins | 150 | 7.43 × 10−3 | 85 | 103 | <1 × 10−16 |
Class Ib16004 | 35 | 1.73 × 10−3 | 23 | 51 | 3.33 × 10−16 |
GP1BB | 181 | 8.96 × 10−3 | 66 | 78 | <1 × 10−16 |
Region . | No. of Genes in Hypothesized Human Regiona . | Pb . | Homologs in Xenopus Region . | Total in Xenopus Region . | Probability Human and Xenopus Share Genes by Chance . |
---|---|---|---|---|---|
VJC11310 | 327 | 1.62 × 10−2 | 172 | 220 | 3.89 × 10−15 |
Class Ib112 | 1158 | 5.73 × 10−2 | 139 | 279 | <1 × 10−16 |
MHC | 216 | 1.07 × 10−2 | 88 | 106 | <1 × 10−16 |
MHC without butyrophilins | 150 | 7.43 × 10−3 | 85 | 103 | <1 × 10−16 |
Class Ib16004 | 35 | 1.73 × 10−3 | 23 | 51 | 3.33 × 10−16 |
GP1BB | 181 | 8.96 × 10−3 | 66 | 78 | <1 × 10−16 |
Based on the human reference GRC38, with 20,199 total genome-wide protein-coding genes.
Proportion of the human genome found in the hypothesized syntenic region.
Most conspicuously, the Xenopus class Ib112 class Ib gene maps between psmb5 and IgL on Xenopus chr 1Lq12 (Fig. 1, Table I), the region corresponding to the minor huMHCpara-14 described above that also contains TCRA/D and IgH/L genes. Consistent with its location on the ancient paralogue, class Ib112, like CD1, clusters outside of all other vertebrate class Ia and class Ib genes in the maximum likelihood phylogenetic tree, and somewhat less so in the neighbor-joining tree (Supplemental Fig. 2). We detected reptilian class I genes orthologous to Xenopus class Ib112 (Fig. 2A) that, where it was possible to examine, also map to this interesting paralogous region (Fig. 2B). Upon closer examination of the Xenopus chr 1L region, we found that class Ib112 is surrounded by genes that map to human chr 19p13 (Supplemental Fig. 3). Conservation of synteny was further evaluated with probability by chance of <1 × 10−16 (Table II). It should be noted that the so-called UT class Ib genes in opossum (26) (also with reptilian orthologs) are also linked to the psmb10 gene in an MHCpara (GenBank accession NC_008801.1: region 685896657- 705364100 [www.ncbi.nlm.nih.gov]). In summary, all three Xenopus class Ib genes map to MHCpara most likely derived from the chr 6/19 precursor, and two of them are linked to genes encoding constitutive catalytic proteasome β subunits.
Note that the positions of class Ib16004 and class Ib145 in the phylogenetic trees do not conform well to their ancient origins that we propose (Supplemental Fig. 2). At least in the case of class Ib145, its location on the same chromosome as the xnc and MHC might subject class Ib145 to gene conversion events that blur its age (e.g., the high similarity of class Ia to class Ib145 in the N-terminal region of the α2 domain and low similarity in the rest of the molecule, Supplemental Fig. 1A). Being in a paralogous region on a different chromosome than MHC/XNC, the clustering of class Ib16004 with Xenopus xnc class Ib genes in the trees is difficult to reconcile with its proposed origins at 1R. Considering the numerous class Ib genes in the frog genome (25) we speculate that there may be opportunities for gene conversion or other unknown mechanisms even among nonhomologous chromosomes.
Evidence of en bloc translocation of MHCpara and identification of MHCtrans
As mentioned above, a large cluster of xnc class Ib genes maps to the telomere of the Xenopus MHC chr 8Lq31-32 (18), which is not assigned as an MHCpara (Figs. 1, 3, Table I, Supplemental Table I). In the MHC of Xenopus and other nonmammalian vertebrates, low numbers (or only one) of class Ia genes (22) are closely linked to the polymorphic psmb and tap genes (27, 28), forming a primordial “class I region” (29). Coevolution among the genes in the class I region has been suggested: there is a strong linkage disequilibrium between the bony fish [psmb and class Ia (medaka) (30) and psmb, tap and class I (zebrafish) (31)] and shown functionally in birds [tap and class Ia (32)]. The XNC loci were likely generated via cis-duplication of MHC class I genes and the subsequent translocation to a telomeric location, perhaps to limit recombination/gene conversion between the single MHC class Ia gene and class Ib (xnc) genes. A similar organization is found for the chicken MHC (B locus), where class Ib along with several class II genes map separately from the MHC in the telomeric region of the same chromosome (Y or Rfp-Y locus) (33) (see below). This secondary region also presumably arose by cis-duplication of MHC genes followed by translocation, but the situation in frogs and chicken is thought to have developed via convergent evolution. We further predict that the splitting of Xenopus class Ib genes from the MHC to the telomere likely allowed expansion of xnc genes and drove neofunctionalization. For example, xnc10-restricted NKT-like cells have been identified in Xenopus (34), and other xnc genes have prospective NKT partners (35, 36).
We found that XNC region contains many genes mapping to human chromosomal region 1q21.1–23.3 (Fig. 3B, Supplemental Table I), specifically a block region surrounding CD1 genes (dotted box in Fig.1). Previously, the 1q21.1–23.3 region was proposed to be a part of huMHCpara-1 (37). However, the proposed MHCpara regions are spread broadly over human chr 1, presumably because of a pericentric inversion on this chromosome (more details below), and thus the integrity of the conservation of the huMHCpara-1 has been questioned (37).
CD1 molecules are similar to MHC class Ia in their protein structure, association with β-2 microglobulin, and Ag-presentation capacity (38, 39). CD1 molecules, however, do not present peptide Ags to conventional T cells but rather lipid Ags to unconventional T cells such as NKT cells and γδT cells, and thus are categorized as class Ib (40). Unlike MHC class Ia, which is expressed ubiquitously, CD1 expression is usually limited to APC, and the CD1 Ag-loading machinery is similar to that of MHC class II (41). It was originally proposed that CD1 genes were generated during 2R and subfunctionalized (42). However, the discovery of cd1 genes in the chicken MHC did not conform well to the 2R hypothesis (43–45). So far, two major hypotheses have been proposed to explain the timing of cd1 emergence and genome evolution: Salomonsen et al. (44) proposed that cd1 was generated by tandem duplication of MHC genes at the primordial state (0R), and paralogous copies were silenced in all paralogous regions during genome duplications rather than direct product of 2R. Miller et al. (45) proposed that cd1 may have arisen more recently, and cd1 genes were later translocated to an MHCpara in mammals. The discovery that cd1 genes map to Chinese alligator huMHCpara-19 (46) (Fig. 4, Supplemental Table I) strongly suggests that cd1 arose pre-2R (reviewed in Refs. 47 and 48). Our discovery of the human chr 1q21.1–23.3 region containing genes whose Xenopus counterparts map to the XNC locus suggests a compromise scenario in which the block of human 1q21.1–23.2 genes, including CD1, was the result of secondary translocation following the intrachromosomal translocation from the MHC (Fig. 4). One caveat is the synteny of cd1 genes in various bird species in which the cd1 genes are found in various linkage groups that are not consistent with each other and most of them are not in MHCpara (Supplemental Table I): human chr 1q25 (mallard and swan goose); 9q22.31 (egret, pigeon, crow, finch, manakin, killdeer, falcon, cuckoo, ibis); and 6q22.31 (eagles). If the synteny on 1q25 and 9q22.31 represents the original location, MHC class I could have existed even in the 0R ancestor (Fig. 1).
In this article, we propose the following scenario (Fig. 4): cd1 was generated by tandem duplication from an MHC class I/II precursor, most likely pre-2R. Subsequently, the class I/II/cd1 genes were cis-duplicated and a block region was translocated to the telomeric region (translocated part of the MHCpara region [MHCtrans]), which allowed expansion of class Ib/cd1 genes. Later, a block region was further translocated to human chr 1q21.1–23.3, coincidentally in huMHCpara-1. During the process, MHC and CD1 loci experienced differential gene loss (loss of MHC class II and CD1 in Xenopus MHCtrans, and loss of MHC genes on human chr 1q21.1–23.3). Finally, expansion of certain genes occurred (class Ib genes [xnc] in Xenopus and CD1 genes in mammalian species including humans). Because most genes mapping to human chr 1q21–23.3 are in the Xenopus XNC region [including KIRREL (49)], whereas all hallmark genes for huMHCpara-1 map to Xenopus 4Lq24-25 with no homologs in both the XNC and 4Lq24-25 regions, translocation seems to be the simplest explanation. Note that the 3′-end of this translocation is at the telomere (Fig. 3A, Supplemental Table I), and the 5′-end contains large clusters of olfactory (OR) and vomeronasal (VNR) genes; both the telomere and repetitive genes may have played a role either in the translocation (especially the telomeric location) or original duplication.
To further examine the evolutionary timing of en bloc translocation of the 1q21.1–23.3 region, we searched for huMHCpara-1 orthologous regions in several representative vertebrates (Fig. 5A). As mentioned earlier, the huMHCpara-1 spreads onto both arms of chr 1, proposed to be partially a result of a pericentric inversion (37). For example, hallmark genes are split onto both arms of chr 1: NOTCH2 maps to 1p13-p11, whereas RXRG and PBX1 map to 1q23.3 (Fig. 5B). Similarly, notch2 maps separately from rxrg and pbx1 in the opossum genome. However, hallmark genes are closely linked in all nonmammalian species (on chr 8 in chicken; on chr 4q25 in Xenopus; and in the elephant shark genome) (Fig. 5B), suggesting that the pericentric inversion must have occurred in a mammalian ancestor. Like in Xenopus, orthologous genes on human chr 1q21.1–23.3 are found on chicken chr 25. Therefore, both regions orthologous to 1q21.1–23.3 in chicken and Xenopus are found on different chromosomes, and thus it seems likely that the translocation of 1q21.1–23.3 region occurred after the bird–mammal separation (Fig. 5A). Note that unlike Xenopus, the chicken MHC is not found on chr 25 (rather on chr 16); however, both chr 16 and 25 are microchromosomes, and we predict that these two chromosomes were split during bird evolution. There seems to have been different genome modifications among mammalian species, having multiple chromosomal breakpoints before the rodent/artiodactyla divergence (data not shown).
In summary, we propose that the CD1 region in mammals is a result of a translocation event, by chance, into huMHCpara-1, and thus there is no strong evidence of class I genes on MHCpara-1 or -9. This is consistent with our hypothesis that a class I precursor gene may have arisen after 1R on only one of the duplicated chromosomes, chr 6/19 (Figs. 4, 6). Contrary to the existing hypothesis that class II predates class I (50–53), we further propose that class I emerged first in evolution because we have not found MHC class II genes anywhere outside the bona fide MHC or paralogous regions (54).
When did the original MHCtrans (red arrow in Fig. 4) arise in evolution? We found it in amphibians (Fig. 5A), but it may be older. Families of class Ib genes in cartilaginous fish that are currently unmapped (55) may be a part of this original MHCtrans. Besides class I and AgR-like genes (see below) in MHCtrans, other immune-related genes such as fcrl (56, 57) and slamf (58) are also found in this region (Figs. 3, 5). Unlike class I and AgR, however, slamf and fcrl per se are not found in bona fide MHCpara and thus likely emerged soon after 2R in early vertebrates. We further predict that their origin, most likely, is from constant (C) 2–type IgSF precursors that were present in the PIC (e.g., KIR genes found on huMHCpara-19 are also derived from these precursors).
Emergence of AgR precursor in the PIC
Linkage of TCR- and Ig-like genes in association to the primordial MHC has been previously suggested (59, 60). AgRs bear a rare, specialized C1-type IgSF domain (61) like those found in MHC class I/II, and thus one might predict their linkage to the primordial MHC. Human TCRA/D genes are found near PSMB5 (chr 14q11 in Fig. 1, Table I), also suggesting ancestral linkage of TCR to MHC. In Xenopus genome, in addition to the close linkage of tcrad-psmb5, the igh locus (62) and igl genes (especially the λ isotype) are closely linked (Xen1q in Fig. 1). These locations strongly support the ancestral linkage of precursor AgR genes to the proto-MHC.
AgRs have a variable (V) domain with a signature IgSF “G” β-strand encoded in a separate element; in the germline of the most simple IgL, the V element encodes strands “A–F” and the J (joining) element encodes the “G” strand (61) (also shown in Supplemental Fig. 1B, 1C). It has been proposed that genes containing a single uninterrupted VJ element (i.e., exon) were present in the primordial MHC, near to genes encoding C1-IgSF domains. Genes encoding these VJ and C1 domains likely combined to become AgR precursors (59, 60), and the RAG transposon (63–65) split one of the VJ single genes into separate V- and J- genetic elements (V-J). One candidate for such a precursor is the NCR3 gene encoding NKp30 (66). NCR3 contains a single VJ exon and maps to the MHC in most studied vertebrates (67). In Xenopus, a cluster of ncr3 genes map to an MHCpara, 4Lq25 (68), whereas there is another set of genes having exactly the same domain structure (xmiv) mapping to the MHC (12) (dark purple boxes in Fig. 1, Supplemental Fig. 3). Whether ncr3 is immediately related to the ancestor of the AgR precursor or not, the xmiv and ncr3 genes are clearly derived from a common VJ precursor gene that was linked to the primordial MHC (Fig. 6, Supplemental Fig. 3) (67). Recently, genes with VJ-C2 structure were discovered in amphioxus (lancelet), an invertebrate deuterostome (69). Whether these genes are immediate relatives to the VJ ancestor or is a divergent descendant is debatable; however, one of the lancelet VJ-C2 genes maps adjacent to the kirrel gene, which maps next to CD1 genes in human chr 1q (dotted red box in Fig.1), strongly supporting its relationship to the VJ precursor.
In addition to the previously identified IgH and L chains, and all four types of TCR genes, there are three novel Xenopus genes that encode a single VJ and a C1-IgSF domains, like TCR or IgL chains in “pre-RAG transposon” state. All three genes are found in MHCpara and we designate them VJC1258, VJC1406, and VJC11310 based on their domain structure and scaffold number in ver 4.1 (light purple boxes in Fig. 1, Table I). VJC11310 is a single-copy gene (Supplemental Fig. 1B) mapping to Xenopus MHCpara-8Lp11-12. Preliminary BlastP analysis exhibited high identity with IgL from various vertebrates with highest similarity to the anole lizard (∼4 × 10−31), and spiny dogfish (shark; 5 × 10−25). VJC11310 was previously reported to be a “germline-joined igl chain” (GenBank accession ACB47447 [www.ncbi.nlm.nih.gov]) (70). However, we mapped all three known rearranging IgL isotypes (λ, κ, σ) to Xenopus chr 1, whereas VJC11310 maps to a different MHCpara region (surrounding genes mapping in the huMHCpara-9 [Supplemental Fig. 3]; linkage probability by chance 3.89 × 10−15 [Table II]), making it highly unlikely that VJC11310 is a bona fide IgL. VJC1258 is also a single-copy gene (Supplemental Fig. 1C), maps upstream of the MHC, and is expressed in the Xenopus thymus (by northern blotting, data not shown). BlastP analysis using the VJ domain exhibited highest identity with IgL from various vertebrates with the highest match to coelacanth (4 × 10−31) and large flying fox (2 × 10−30), whereas the C domain matched various cartilaginous fish IgH and IgL with much lower E-values ranging from 1 × 10−9 to 9 × 10−5. The PreTα (PTCRA) gene, which encodes a single C1-IgSF domain and is so far found only in mammalian species (71), also maps upstream of the human MHC (striped box in Fig. 1). The prediction is that PTCRA originally had a V(J) domain, but it was lost in evolution (72). It is possible that Xenopus VJC1258 was related to a precursor of preTα before loss of the V(J) domain, but phylogenetic analysis of VJC1258 and all AgR including preTα did not support this scheme (data not shown). Moreover, BlastP analysis using the C domain did not select PreTα in any other species, suggesting VJC1258 is not closely related to preTα. Regardless of their function and orthology to other genes, mapping of these AgR-like genes to all MHCpara strongly supports the idea that an AgR precursor was present at the 0R stage (i.e., PIC) (Fig. 6).
We also mapped a cluster of VJC1406 genes (Supplemental Fig. 1D) to the scaffolds with xnc genes in the MHCtrans region along with the genes mapping to human 1q21.1–23.3 (Fig. 3A, Supplemental Table I). Again, linkage of MHC class I to AgR-like genes is clear. We found VJC1406 orthologs in many species of reptiles, birds, and other species; during preparation of our article, VJC1406 orthologs have been recently reported from chicken and named PRARP. PRARP were likely lost in mammals and teleost fish but are present in coelacanth and likely in sharks (73). The authors did not conclude that PRARP were AgR-like genes or MHC associated, but the chicken prarp genes were expressed in lymphocytes and thus potentially have an immune function, and they were proposed as candidates for invasion by the RAG transposon. Regardless of their functions, their synteny is well conserved among different vertebrate species (73) (Fig. 3B). In our study, we found a clear linkage of this gene family to MHC class I genes in the MHCtrans region of lower vertebrates (Fig. 3), further confirming the hypothesis that VJ-IgSF were present in the PIC.
In summary, VJ- and C1-IgSF–containing AgR-like genes are present in both major and minor MHCpara regions and MHCtrans, showing that they were present in the PIC before 1R. The consistent linkage of AgR-like and MHC class I genes on chromosomes derived from chr 6/19 after 1R further demonstrates that the presence of AgR precursors in the PIC predates the emergence of bona fide MHC class I genes (Fig. 6).
Evolution of TCR genes
In a previous study, we proposed a scenario for the evolutionary emergence of TCRD/A and IgH genes (6, 62). In this study, we further examined the genome evolution of the TCRB/G genes. Whereas TCRA and TCRD genes are encoded in the minor huMHCpara-14, TCRB and TCRG genes map at both ends of human chr 7 (Fig. 7A, Table III). Hood and colleagues (74) proposed that this split arrangement is an evolutionarily derived situation, and TCRB and TCRG had been originally closely linked, like the extant TCRA/D genes, but were separated via a pericentric inversion. In Xenopus, tcrb and tcrg are found on different chromosomes (tcrb 7Lq23-24; tcrg 6Lp12-13 [Table III]). However, the Xenopus tcrb gene maps near tapbpl and cd4/lag3 genes, which are found in the NK cell complex (NKC) on human chr 12p13.31 (Figs. 1, 7B, Table III). The NKC is also considered as a minor MHCpara, based on 1) the presence of the marker gene A2M (homolog of C3,4,5) (6); 2) the presence of the TAPBP paralogue, TAPBPL (75) (TAPBP maps to the MHC); 3) mapping of chicken C-type lectin NK receptor genes to the MHC (6, 76, 77), whereas the C-type lectin NK receptor genes map to the mammalian NKC; and 4) studies of neurotrophin gene distribution in jawed vertebrates (7). Thus, tcrb linkage to an MHCpara also suggests an ancestral linkage of TCR precursor genes to the primordial MHC. In contrast, Xenopus tcrg may have been translocated to an unrelated region (chr 6) having no connection to the MHCpara.
Species . | Chromosome . | Gene . | Positiona . |
---|---|---|---|
Human | 12p13.2 | KLRD1 (NKC) | 10,238,385..10,329,607 |
12p13.31 | A2M | 9,067,708..9,115,962 | |
12p13.31 | CD4 | 6,789,472..6,820,810 | |
12p13.31 | LAG3 | 6,772,483..6,778,455 | |
12p13.31 | TAPBPL | 6,451,655..6,472,006 | |
7q34 | TCRβ | 142,299,011..142,813,287 | |
7p14.1 | TCRγ | 38,240,024..38,368,055 | |
Pig | 5 | klrd1 | 61,583,868..61,596,985 |
5 | a2m | 65,274,903..65,320,342 | |
5 | cd4 | 66,326,568..66,353,856 | |
5 | lag3 | 66,364,099..66,369,484 | |
5 | tapbpl | 66,649,711..66,658,562 | |
18 | tcrβ | 7,715,206..7,823,795 | |
9 | tcrγ | 119,542,537..119,635,982 | |
Mouse | 6 | a2m | 121,636,166..121,679,238 |
6 | cd4 | 124,864,693..124,888,248 | |
6 | lag3 | 124,904,359..124,912,434 | |
6 | tapbpl | 125,223,927..125,231,923 | |
6 | klrd1 | 129,588,092..129,598,775 | |
6 | tcrβ | 40,891,296..41,558,371 | |
13 | tcrγ | 19,178,042..19,356,476 | |
Opossum | 8 | a2m | 104,682,643..104,771,506 |
8 | cd4 | 108,220,454..108,260,998 | |
8 | lag3 | 108,170,654..108,179,156 | |
8 | klrk1 | 113,517,720..113,533,133 | |
8 | tcrβ | 205,270,812..205,335,586 | |
6 | tapbpl | 290,987,908..290,993,624 | |
6 | tcrγ | 283,848,252.. 283,942,577 | |
Chicken | 1 | a2m | 76,229,983..76,255,770 |
1 | tapbpl | 76,876,884..76,889,825 | |
1 | lag3 | 77,194,590..77,202,789 | |
1 | cd4 | 77,208,503..77,219,970 | |
1 | tcrβ | 78,071,772..78,072,534 | |
1 | klrdr1 | 78,423,947..78,430,724 | |
2 | tcrγ | 49,292,467..49,295,949 | |
Turkey | 1 | tcrβ | 74,734,696..74,742,685 |
1 | lag3 | 75,575,531..75,581,610 | |
1 | cd4 | 75,588,055..75,599,611 | |
1 | tapbpl | 75,900,408..75,911,755 | |
1 | a2m | 79,842,550..79,855,332 | |
6 | tcrγ | 47,636,597..47,652,020 | |
Salmon | 2 | tapbpl | 10,225,084..10,231,543 |
2 | klrd1 | 24,161,287..24,177,629 | |
2 | cd4-2 | 30,978,314..30,984,887 | |
2 | cd4-1 | 30,986,632..31,013,703 | |
9 | a2m | 108,156,058..108,189,009 | |
1 | tcrβ | 3,348,168..3,354,302 | |
20 | tcrγ | 9,074,301..9,083,342 | |
Zebrafish | 16 | tapbpl | 9,899,183..9,911,977 |
16 | cd4-1 | 12,021,001..12,055,289 | |
16 | cd4-2 | 12,057,069..12,072,262 | |
16 | clec | 29,030,785..29,042,169 | |
15 | a2m | 21,178,237..21,196,748 | |
17 | tcrβ | 48,395,034..48,401,797 (C) | |
2 | tcrγ | 31,873,021..31,902,832 (V) |
Species . | Chromosome . | Gene . | Positiona . |
---|---|---|---|
Human | 12p13.2 | KLRD1 (NKC) | 10,238,385..10,329,607 |
12p13.31 | A2M | 9,067,708..9,115,962 | |
12p13.31 | CD4 | 6,789,472..6,820,810 | |
12p13.31 | LAG3 | 6,772,483..6,778,455 | |
12p13.31 | TAPBPL | 6,451,655..6,472,006 | |
7q34 | TCRβ | 142,299,011..142,813,287 | |
7p14.1 | TCRγ | 38,240,024..38,368,055 | |
Pig | 5 | klrd1 | 61,583,868..61,596,985 |
5 | a2m | 65,274,903..65,320,342 | |
5 | cd4 | 66,326,568..66,353,856 | |
5 | lag3 | 66,364,099..66,369,484 | |
5 | tapbpl | 66,649,711..66,658,562 | |
18 | tcrβ | 7,715,206..7,823,795 | |
9 | tcrγ | 119,542,537..119,635,982 | |
Mouse | 6 | a2m | 121,636,166..121,679,238 |
6 | cd4 | 124,864,693..124,888,248 | |
6 | lag3 | 124,904,359..124,912,434 | |
6 | tapbpl | 125,223,927..125,231,923 | |
6 | klrd1 | 129,588,092..129,598,775 | |
6 | tcrβ | 40,891,296..41,558,371 | |
13 | tcrγ | 19,178,042..19,356,476 | |
Opossum | 8 | a2m | 104,682,643..104,771,506 |
8 | cd4 | 108,220,454..108,260,998 | |
8 | lag3 | 108,170,654..108,179,156 | |
8 | klrk1 | 113,517,720..113,533,133 | |
8 | tcrβ | 205,270,812..205,335,586 | |
6 | tapbpl | 290,987,908..290,993,624 | |
6 | tcrγ | 283,848,252.. 283,942,577 | |
Chicken | 1 | a2m | 76,229,983..76,255,770 |
1 | tapbpl | 76,876,884..76,889,825 | |
1 | lag3 | 77,194,590..77,202,789 | |
1 | cd4 | 77,208,503..77,219,970 | |
1 | tcrβ | 78,071,772..78,072,534 | |
1 | klrdr1 | 78,423,947..78,430,724 | |
2 | tcrγ | 49,292,467..49,295,949 | |
Turkey | 1 | tcrβ | 74,734,696..74,742,685 |
1 | lag3 | 75,575,531..75,581,610 | |
1 | cd4 | 75,588,055..75,599,611 | |
1 | tapbpl | 75,900,408..75,911,755 | |
1 | a2m | 79,842,550..79,855,332 | |
6 | tcrγ | 47,636,597..47,652,020 | |
Salmon | 2 | tapbpl | 10,225,084..10,231,543 |
2 | klrd1 | 24,161,287..24,177,629 | |
2 | cd4-2 | 30,978,314..30,984,887 | |
2 | cd4-1 | 30,986,632..31,013,703 | |
9 | a2m | 108,156,058..108,189,009 | |
1 | tcrβ | 3,348,168..3,354,302 | |
20 | tcrγ | 9,074,301..9,083,342 | |
Zebrafish | 16 | tapbpl | 9,899,183..9,911,977 |
16 | cd4-1 | 12,021,001..12,055,289 | |
16 | cd4-2 | 12,057,069..12,072,262 | |
16 | clec | 29,030,785..29,042,169 | |
15 | a2m | 21,178,237..21,196,748 | |
17 | tcrβ | 48,395,034..48,401,797 (C) | |
2 | tcrγ | 31,873,021..31,902,832 (V) |
Beginning..end of positions in chromosomes.
We decided to further examine the linkage status of TCRB and TCRG genes in other vertebrate genomes. Other mammals (e.g., pig, mouse), besides humans, have a linkage of TCRB to NKC genes (Fig. 7A, Table III). Linkage of tcrb to the NKC is also seen in birds (e.g., chicken and turkey). Linkage of tcrb to the NKC has not been documented in bony fish: In the primitive bony fish, spotted gar, tcrb is linked to genes on human chr 14q24.1 and 15q15 on LG7, whereas a2m and tapbpl map to LG26. Synteny of tcrg is conserved among vertebrate species; like Xenopus, tcrg was found on a separate chromosome in all nonprimate species. However, in opossum, tcrg is linked to tapbpl, suggesting a remnant linkage of tcrg to NKC.
In summary, the combined data favor the existing hypothesis that TCRB and TCRG were indeed originally linked in minor huMHCpara-12, followed by chromosome split to human chr 7, secondary translocation of block regions containing TCRG (Fig. 7B). Alternatively, TCRB and TCRG were differentially silenced after translocation from their original location. In either scenario, the splitting up of the two genes and subsequent translocation(s) were involved in positioning tcrb and tcrg at either end of human chr 7.
Based on the distribution of the orthologous genes found on Xenopus chr 1q (Supplemental Fig. 3), we speculate that huMHCpara-12 split from huMHCpara-14. Also, a block region containing the iglλ gene (human chr 22q11) is derived from huMHCpara-14 (linkage probability by chance <1 × 10−16 [Table II]). Therefore, our analysis suggests that all rearranging AgR are likely derived from the huMHCpara-19 precursor. Invasion of the RAG transposon likely happened on hu-MHCpara-19 after 2R, splitting the VJ element into separate V and J elements, and the various pairs of AgR genes are suggested to have been generated via cis duplications. This theme is discussed further below (Fig. 8).
Discussion
We have conducted a genome survey for loci involved in adaptive immunity and propose hypotheses for the origins of the PIC (Fig. 6). We also uncovered evidence of an en bloc translocation of the loci surrounding the CD1 genes (Figs. 4, 5A). Finally, we provide compelling evidence for the timing of the emergence of MHC class I(/II) and AgR in a gnathostome ancestor (Figs. 6, 7B) and have uncovered nonrearranging AgR-like genes in MHCpara that may be related to the Ig/TCR ancestor.
Emergence of IgSF Ag receptors and PIC
It has been previously predicted that AgR precursor genes were linked to the proto-MHC and translocated later in evolution (59, 60, 78). To address this hypothesis, we mapped AgR/AgR-like genes on Xenopus chromosomes and uncovered several nonrearranging genes with structures similar to TCR and IgL chains: a single uninterrupted VJ-type IgSF domain followed by a C1-IgSF domain. It has been also speculated (60) that modern AgRs were generated by recruitment of C1-IgSF in the preadaptive immune complex followed by the RAG transposon splitting a VJ gene into V- and J- genetic elements (V-J). Thus, extant VJ-IgSF–containing genes are potentially descendants of such precursor genes (69, 73). Like other immune genes directly involved in Ag recognition, all AgR-like genes described in this report are diploidized in the tetraploid X. laevis, and therefore they likely play roles in immunity (18, 73). As mentioned above, NCR3, another gene encoding a VJ-type domain, maps to the human (and other vertebrate including sharks [M.E. Janes, L. Du Pasquier, M.F. Flajnik, and Y. Ohta, manuscript in preparation]) MHC (Fig. 1), and an amphioxus VJ gene (69) linked to a kirrel homolog further supports the hypothesis that the AgR precursor was present in the PIC at 0R.
Mapping of Ig and TCR genes in several vertebrates to MHCpara indicates that all of the extant AgR seemed to be derived from an ancestral chr 19 paralogue. This suggests that an uninterrupted VJ element was split by the RAG transposon, and after gene duplication, one duplicate acquired a diversity (D) element, generating paired receptor genes (74). Hood et al. (79) suggested an ancestral VJ homodimer, which, after the RAG transposon invasion and gene duplication, gave rise to a heterodimeric receptor. As proposed by Davis and Bjorkman (80), the original receptor may have been TCR α/β-like, because the RAG rearrangement break at CDR3 makes the most sense for an MHC-restricted AgR (i.e., the most diverse part of the AgR binding to the true Ag, peptide, or another original type of Ag) in the MHC groove. We previously proposed (59) that the original AgR was derived from NK-like receptors that recognized MHC-like molecules encoded both in the PIC or the proto-MHC, and we now provide evidence for such candidate receptors. Subsequent duplication of the paired TCR genes and translocation may have relieved the pressure of MHC restriction, allowing the duplicated receptor to bind free Ags, like γ/δ TCR today. Another duplication in cis may have occurred [as previously suggested (62)] on huMHCpara-14, generating IgH/L by a cis-duplication of the neighboring (TCRA/D) pair: the two sets of loci (TCRA/D and IgH/L) are still linked in extant vertebrates including Xenopus (62).
Class I, CD1, and class II
We also identified novel class I genes and mapped them in MHCpara derived from the chr 6/19 precursor after 1R. Our analyses suggest that MHC class I likely arose after the first round of genome duplication rather than prior to 1R (Fig. 6). The previous proposals (43–45) were partially supported by the presence of CD1 genes on huMHCpara-1. In contrast, we present evidence that the 1q21.1–23.3 region, including the CD1 genes, was secondarily translocated from another location, which itself was translocated from the MHC (MHCtrans) (red arrow in Fig. 6); thus, the presence of CD1 on huMHCpara-1 was likely the result of a chance event and not a genome-wide duplication. There is, however, an alternative explanation: duplication of both MHC and MHCtrans may have been generated on both loci on chr 1 and 6 but differentially silenced during 2R. We think this scenario is unlikely because some housekeeping genes would have remained in other MHCpara as homologs, as we commonly see in the tetraploid X. laevis genome compared with the diploid X. tropicalis (14). KIRREL homologs, KIRREL 2 (19q13.12) and KIRREL 3 (11q24.2), are found in major and minor huMHCpara (68, 78), whereas KIRREL maps to human chr 1q23.1 but maps in Xenopus MHCtrans. Furthermore, kirrel maps adjacent to notch in the Drosophila genome, presumably an ancestral linkage (16). Although this is only one example, the distribution of KIRREL genes adds another layer of support to our hypothesis that the MHCtrans was initially translocated from the MHC (Fig. 5A). The presence of a cd1 gene in Chinese alligator on huMHCpara-19 (46) further suggests that CD1 emerged after 1R but before 2R and was differentially silenced in reptiles and birds (Fig. 4). Regardless of the precise timing of CD1’s emergence, we propose that class II arose later and may have co-opted the CD1 pathway of Ag presentation. We found no class II genes outside of the MHC.
The overarching hypothesis is that all constituents/domains of current adaptive (and some innate) immune genes were genetically linked in the PIC (9), which predated the MHC (6), and these PIC components were “mixed and matched” to generate the precursors of modern immune genes (9), especially the VJ and C1-IgSF domains that are fundamental components of the adaptive immune system (e.g., Igs, TCR, MHC class I/II, B2M) (81–83). It was previously predicted that Ig/TCR/MHC precursor genes originated in the MHC based on preliminary evidence (6, 60). In addition to MHCpara, genes linked in the MHCtrans region also provide an indication of the primordial linkage of AgR/MHC; as mentioned above, other genes, like KIRREL, FcRL, and SLAMF, map to MHCtrans, corresponding to the human 1q21.1–23.3 region (Figs. 3A, 5A, Supplemental Table I). Therefore, other domains such as C2-IgSF (building blocks of FcRL and SLAMF) and B30.2 (building block for butyrophilin) (11) were also present in the PIC and likely used as raw material to generate new sets of immune genes. In addition, the synteny of SLAMF and CD1 genes may be another example of functional clustering, because SLAM family members are involved in NKT cell development in the thymus (84).
Jawless and jawed vertebrate immunological big bangs and the MHCpara
Finally, we also speculate on the dichotomy between the jawless and jawed vertebrate adaptive immune systems. Leucine-rich repeat (LRR) domain-containing variable lymphocyte receptor (VLR) genes are rearranging adaptive immune genes unique to jawless vertebrates (lamprey and hagfish) (85). LRR domains are also present in many other proteins such as TLR (86, 87), which are predicted to be encoded in PIC because toll is linked to MHC paralogous hallmark genes in Drosophila (16). Pancer identified three VLR homologous genes based on the presence of LRR carboxy-terminal domain (88), and, surprisingly, we found all three genes mapping to MHCpara regions: GP1BB is closely linked to IgLλ on human chr 22q11 (Figs. 1, 6) and Xenopus chr-1q (Supplemental Fig. 3) (linkage probability by chance <1 × 10−16 [Table II]); Xenopus gp1ba and gp9 could not be mapped, but human GP1BA maps closely to PSMB6 on human chr 17p13.2, and GP9 maps on human chr 3q21.3, a region also designated as minor MHCpara (60). Both GP1BB and GP1BA were mapped on chromosomes derived from huMHCpara-19. This unexpected result strongly suggests that the precursor of VLR genes was also in PIC or an ancestral MHCpara. We have searched the lamprey and hagfish genomes for synteny of the VLR genes but could not map any linked genes. Better assembly of the lamprey and hagfish genomes could provide genetic evidence for further confirmation. Depending on the precursor of human chr 3, the VLR predecessor could have been present either at 1R or 0R (PIC). In either scenario, our model predicts that VLR predates the emergence of rearranging IgSF-containing AgR. At this point, we have no working hypothesis for why VLRs would be encoded in the MHCpara besides the basic idea that many immune gene families seems to be conceived in these regions.
There was an expansion of gene families and neofunctionalization [e.g., globin genes (89)] in early jawed vertebrates shortly after 2R and perpetuated in the gnathostome lineage. In contrast, the jawless fish either maintained the primordial state or evolved novel globin genes (89). We suggest that such a major dichotomy occurred for the immune system as well (Fig. 8): adaptive immunity likely emerged in the jawless vertebrates in the first “Big Bang” with major features such as clonal selection of lymphocytes bearing somatically generated Ag receptors, emergence of the thymus, and appearance of lymphocyte subsets (90). In our scenario, as opposed to a model proposing parallel evolution of VLR and Ig/TCR systems, the VLR system emerged during the first Big Bang, and then was superseded by the Ig/TCR system after invasion of an VJ-IgSF gene by the RAG transposon at 2R. As previously suggested, RAG-mediated rearrangement provides a distinct advantage over APOBEC-mediated recombination in that the CDR3 loop can be wildly different in size (91), accommodating either a rich adaptive repertoire or one that is more innate in nature. We suggest that the RAG transposon invasion at 2R was the innovative event that initiated a second Big Bang of adaptive immunity, resulting in the emergence of immunoproteasomes, emergence and expansion of AgR, and the first appearance of SLAM family members, all of which likely occurred on the chr 6 and chr 19 ancestral paralogues. Other features of the gnathostome adaptive immune system, such as emergence of secondary lymphoid tissues, expansion of cytokine and chemokine networks, and appearance of a complex thymic architecture also occurred over a short period of evolutionary time, in some cases under the influence of genes mapping to MHC paralogous regions, e.g., TNF (92) and B7 family members (68).
Acknowledgements
We thank Hanover Matz and Dr. Louis Du Pasquier for critical reading of the manuscript and advice on the nonrearranging AgR-like genes.
Footnotes
This project was supported by National Institutes of Health Grants AI140326-26 and AI02877 to Y.O. and M.F.F.
The online version of this article contains supplemental material.
Abbreviations used in this article:
- chr
chromosome
- FISH
fluorescence in situ hybridization
- huMHCpara
human MHCpara
- IgSF
Ig superfamily
- L
long
- LRR
leucine-rich repeat
- MHCpara
MHC paralogue
- MHCtrans
translocated part of the MHC paralogous region
- NCBI
National Center for Biotechnology Information
- NKC
NK complex
- PIC
primordial immune complex
- S
short
- VLR
variable lymphocyte receptor.
References
Disclosures
The authors have no financial conflicts of interest.