Abstract
β2-Microglobulin (β2M) is believed to have arisen in a basal jawed vertebrate (gnathostome) and is the essential L chain that associates with most MHC class I molecules. It contains a distinctive molecular structure called a constant-1 Ig superfamily domain, which is shared with other adaptive immune molecules including MHC class I and class II. Despite its structural similarity to class I and class II and its conserved function, β2M is encoded outside the MHC in all examined species from bony fish to mammals, but it is assumed to have translocated from its original location within the MHC early in gnathostome evolution. We screened a nurse shark bacterial artificial chromosome library and isolated clones containing β2M genes. A gene present in the MHC of all other vertebrates (ring3) was found in the bacterial artificial chromosome clone, and the close linkage of ring3 and β2M to MHC class I and class II genes was determined by single-strand conformational polymorphism and allele-specific PCR. This study satisfies the long-held conjecture that β2M was linked to the primordial MHC (Ur MHC); furthermore, the apparent stability of the shark genome may yield other genes predicted to have had a primordial association with the MHC specifically and with immunity in general.
The adaptive immune system as defined in humans arose abruptly in a jawed vertebrate (gnathostome) ancestor ∼500 million years ago. The major players of adaptive immunity, the rearranging Ag receptors (Ig and TCR), the Ag-presenting molecules (MHC class I and class II), and molecules involved in Ag processing (e.g., immunoproteasomes and the TAPs) are all present in sharks as the oldest extant jawed vertebrates but absent in all invertebrates and jawless fish (1). The MHC encodes the class I and class II proteins, which present foreign peptides to T cells to initiate adaptive immune responses, as well as the Ag processing molecules and a large number of other genes involved in various immune functions. The class I and class II tertiary structures are nearly identical, composed of four external domains, the two membrane-proximal domains being members of the Ig superfamily (IgSF) and the membrane-distal domains forming a unique structure called the peptide-binding region (PBR). However, the chain composition differs between class I and class II molecules: class II molecules are heterodimers of α- and β-chains each consisting of one half of the PBR, one IgSF domain, and transmembrane/cytoplasmic regions, whereas class I molecules are composed of an H or α-chain and the requisite L chain, β2-microglobulin (β2M), the former comprising the entire PBR, one IgSF domain, and transmembrane/cytoplasmic regions, and the latter only one IgSF domain. The remarkable similarity of the class I and class II structures clearly suggests that they were generated from a common ancestor, presumably by tandem duplication; thus, it has been assumed that class I, β2M, and class II genes were tightly linked at one point in evolution (1), although it is debatable whether the ancestor of class I/II molecule was class I- or class II-like or an unrecognizable common ancestor (2–5).
In all jawed vertebrates except teleost fish, a taxon having a highly modified genome correlating with a genome-wide duplication early in teleost evolution (6, 7), both MHC class I and II genes are closely linked within the MHC (8). Although class I genes are encoded in a region downstream of the class II region (2–4 Mb) in the MHC of most mammals, a single or low number of class I genes are found in close proximity to class I processing and (except teleost fish) class II genes in most nonmammalian vertebrates, in what is predicted to be the primordial organization (9–15). β2M is encoded in diverse regions outside the MHC in all the species examined to date, including mammals (16), birds (17), amphibians (18), and bony fish (19), and therefore the lack of linkage of β2M to the MHC and inconsistent synteny around β2M have been assumed to be a result of repeated translocations out of the MHC over evolutionary time or to serial translocations after the early loss of MHC linkage (20).
In this study, we characterized the nurse shark (Ginglymostoma cirratum) single-copy β2M gene and mapped it to the MHC. The primitive synteny preserved in this extant vertebrate validates early suppositions regarding MHC evolution and further suggests that other ancient features of the MHC also may be preserved.
Materials and Methods
Animals
Genomic DNA was isolated from RBCs for mapping analysis from the nurse shark family as previously described (21). The procedure of animal use was reviewed and approved by the Institutional Animal Care and Use Committee at the University of Maryland.
Bacterial artificial chromosome library screening
The 17 bacterial artificial chromosome (BAC) filters with 11-fold genomic coverage (22) were screened with radiolabeled full-length β2M or ring3 probes under high-stringency conditions (23). Membranes were exposed to x-ray film for various lengths of time to obtain positive signals and the desired background. Putative positive clones were then re-spotted on nylon membranes for colony hybridization and tested by Southern blotting to confirm true positives. BAC insert DNA was isolated using the PhasePrep BAC DNA kit (Sigma-Aldrich), and the sequence was determined by shotgun sequencing at the sequencing facility at Tokai University with 7.5× coverage.
Sequence alignment and phylogenetic tree
Database searches
Genome synteny in various species was retrieved and analyzed from publicly available Web sites as noted. Genes from mouse, chicken, human, opossum, and zebrafish were retrieved from GenBank (http://www.ncbi.nlm.nih.gov), and information on other genomes was retrieved from the following Web sites: elephant shark genome (http://blast.fugu-sg.org/); Anolis genome (http://genome.ucsc.edu/cgi-bin/hgGateway?db=anoCar1); Xenopus genome (http://genome.jgi-psf.org/Xentr4/Xentr4.home.html); and Fugu genome (http://genome.jgi-psf.org/Takru4/Takru4.home.html).
In-house EST collection
We constructed the cDNA library using the Gateway System (Invitrogen) from adult nurse shark pancreas. To eliminate Ig genes, we first hybridized with Ig H and L chain probes under high-stringency conditions. Negative colonies (∼8000) were then manually picked and sequenced from the vector end. All draft sequences were blastx searched against GenBank databases, and we obtained ∼1150 sequences not specific to the pancreatic enzymes (Y. Ohta and M.F. Flajnik, personal observations).
Single-strand conformation polymorphism analysis
Nurse shark ring3 primers were designed based on the sequence obtained from BAC GC_614H19 clone. Multiple primers were tried, and we selected the primer set anchoring exons 4 and 5 for the single-strand conformation polymorphism (ssCP) analysis. The primers were exon 4 forward, 5′-GTTAACACCTGCACCAAAAT-3′; and exon 5 reverse, 5′-ATTGGGACCTGAGACACAGT-3′. PCR was performed at 94°C for 4 min, followed by 35 cycles of 94°C for 1 min, 62°C for 1 min, and 72°C for 1 min, with a final extension of 72°C for 10 min using 2–500 ng genomic DNA as template. The ∼1340-bp PCR product was cleaned by gel extraction. The ssCP gel (0.5× MDE gel; Cambrex Bio Science Rockland) was run at 16°C for 30 h in 0.6× Tris/borate/EDTA buffer with 1 W constant power.
Allele-specific PCR
Nurse shark β2M sequences were obtained from family 2 with known MHC haplotypes. PCR was performed using a forward primer in intron 2 (NSB2mint2For: 5′-TTACACATCACCACCACCTC-3′) and a reverse primer designed from the IgSF exon (exon 3) (NSB2mex3Rev: 5′-GATTGATTCAGTAGC-3′). We amplified β2M gene fragments from several animals carrying different maternal and paternal haplotype combinations to find allele-specific polymorphisms. After we identified a two-nucleotide deletion in intron 2 in the paternal haplotype in animals belonging to groups “i” and “j” (p3), allele-specific primers were designed for each gene in which deletions are positioned at the third and fourth nucleotide positions at the 3′-end of primers. PCR was performed using a combination of allele-specific and NSB2mex3Rev primers at 94°C for 4 min, followed by 35 cycles of 94°C for 1 min, 58°C for 1 min, and 72°C for 1 min, with a final extension of 72°C for 10 min using 2–500 ng genomic DNA as template. We also found animals with the “CC-deletion” allele in two other families (1 and 3).
Northern blotting
Total RNA was isolated from various nurse shark tissues by using the TRIzol reagent (Invitrogen). Twenty micrograms of total RNA was electrophoresed and blotted onto Optitran Nitrocellulose membrane (Schleicher & Schuell). The membrane was hybridized with full-length shark probes and washed under high-stringency conditions (23).
Southern blotting
Genomic DNA (10 μg) was digested with various restriction enzymes to obtain useful RFLP in unrelated sharks with multiple enzymes. The IgSF exon was used to determine the number of loci for β2m under high-stringency conditions (23). Hybridization with MHC class I leader and α1 domain probe was performed under low-stringency conditions (23). To determine the MHC groups in the shark family 2, we digested genomic DNA with HindIII and hybridized with radiolabeled probe including the leader–α1 domains of MHC class I under high-stringency conditions.
Sequence analysis of MHC class I alleles and sire designation
MHC class I sequences were obtained from PCR amplification with primers from α1 domain forward, 5′-GGTCGGTTATGTGGATGATC-3′; and α2 domain reverse, 5′-TTGCAGCCACTCGATACA-3′. PCR amplification was performed for 4 min at 94°C, followed by 35 cycles of 94°C for 1 min, 56°C for 1∼2 min, 72°C for 1 min, and a final extension at 72°C for 10 min. An ∼550-bp fragment amplicon was cloned into the pCRII TA cloning vector (Invitrogen), and individual clones were sequenced. Nurse shark families 2 and 3 were genotyped using 12 DNA microsatellite markers and assigned sires (E.J. Heist, J.C. Carrier, H.L. Pratt, and T.C. Pratt, submitted for publication).
Statistical analysis of linkage
We used parametric linkage analysis to formally assess the evidence for linkage of β2M to the MHC region in the offspring of deletion-carrying sires. This approach assesses the odds of the likelihood of obtaining the observed data set if the two loci are linked versus if the loci are not linked, showing as a log of the odds (LOD) score. The paternal sibships were determined based on consolidated data from combination of Southern blotting, sequencing of MHC class Ia alleles, and microsatellite analyses (shown in Table I).
. | MHCa . | Haplotypes . | m-Satellite . | Phase . | |||
---|---|---|---|---|---|---|---|
Sib No. . | Old Group . | New Group . | MHC Class Ia . | β2m . | Sire . | 1 . | 2 . |
Family 2 | |||||||
15 | i | i | m2/p3 | del | 4 | NR | R |
30 | i | i | m2/p3 | del | 4 | NR | R |
21 | j | j | m1/p3 | del | 4 | NR | R |
25 | j | j | m1/p3 | del | 4 | NR | R |
31 | j | j | m1/p3 | del | 4 | NR | R |
33 | j | j | m1/p3 | del | 4 | NR | R |
39 | j | j | m1/p3 | del | 4 | NR | R |
20 | e′ | e′ | m1/p6 | ins | 4 | NR | R |
32 | e′ | e′ | m1/p6 | ins | 4 | NR | R |
36 | e′ | e′ | m1/p6 | del | 4 | R | NR |
28 | g′ | g′ | m2/p6b | ins | 4 | NR | R |
13 | c | g′ | m2/p6b | ins | 4 | NR | R |
Family 3 | |||||||
8 | g | m2/p2 | del | 2 | NR | R | |
23 | g | m2/p2 | del | 2 | NR | R | |
6 | d | m1/p4 | ins | 2 | NR | R | |
7 | d | m1/p4 | ins | 2 | NR | R | |
9 | d | m1/p4 | ins | 2 | NR | R | |
19 | d | m1/p4 | ins | 2 | NR | R |
. | MHCa . | Haplotypes . | m-Satellite . | Phase . | |||
---|---|---|---|---|---|---|---|
Sib No. . | Old Group . | New Group . | MHC Class Ia . | β2m . | Sire . | 1 . | 2 . |
Family 2 | |||||||
15 | i | i | m2/p3 | del | 4 | NR | R |
30 | i | i | m2/p3 | del | 4 | NR | R |
21 | j | j | m1/p3 | del | 4 | NR | R |
25 | j | j | m1/p3 | del | 4 | NR | R |
31 | j | j | m1/p3 | del | 4 | NR | R |
33 | j | j | m1/p3 | del | 4 | NR | R |
39 | j | j | m1/p3 | del | 4 | NR | R |
20 | e′ | e′ | m1/p6 | ins | 4 | NR | R |
32 | e′ | e′ | m1/p6 | ins | 4 | NR | R |
36 | e′ | e′ | m1/p6 | del | 4 | R | NR |
28 | g′ | g′ | m2/p6b | ins | 4 | NR | R |
13 | c | g′ | m2/p6b | ins | 4 | NR | R |
Family 3 | |||||||
8 | g | m2/p2 | del | 2 | NR | R | |
23 | g | m2/p2 | del | 2 | NR | R | |
6 | d | m1/p4 | ins | 2 | NR | R | |
7 | d | m1/p4 | ins | 2 | NR | R | |
9 | d | m1/p4 | ins | 2 | NR | R | |
19 | d | m1/p4 | ins | 2 | NR | R |
Old group is taken from Ref. 28, and new groups are assigned in this study.
MHC class Ia sequences revealed that sib 13 is further categorized with group g′ in this study.
del, CC-deletion haplotype; NR, nonrecombinant; R, recombinant.
The LOD score is calculated as follows when parental phase (linkage status) is known: LOD = log10 {[(θ)R (1 − θ)NR]/(0.5)R+NR}, where θ is the recombination fraction, NR is the number of nonrecombinant offspring, and R is the number of recombinant offspring.
Because the parental phase was unknown in the current study due to a lack of grandparental genotypes, a phase ambiguous LOD score was first calculated for each family by taking the log of the average odds for the two possible phases (1 and 2 in Table I), and the resulting LOD scores were then summed over the two families to obtain the LOD score at a given recombination fraction. LOD scores were calculated at recombination fractions between 0 and 0.5 to obtain the recombination fraction where the LOD score was maximized (26). The corresponding p value was calculated using a one-sided χ2 test of LOD ×2 (loge10) (27).
Results
Characterization of nurse shark β2M
Cartilaginous fish are the oldest living vertebrates having an adaptive immune system centered upon Ig, TCR, and MHC (1). When it was suggested that class I and class II genes may have evolved in separate linkage groups from studies of teleost fish (28), we demonstrated in family studies that the two MHC classes were closely linked in two shark species, nurse shark and banded houndshark (21). To gain further insight into the primordial MHC organization, we have isolated many shark genes associated with adaptive immunity, including β2M. The full-length β2M clone was found in an in-house EST collection (GenBank accession number HM625831), as well as from a previously published genomic sequence (GenBank accession number GQ865623) (29), and the deduced amino acid sequence was aligned with β2M from other species (S1). As was noted in previous studies, evolutionarily conserved residues are either found in all C1-IgSF (or just IgSF) domains (29, 30) or are predicted to be at class Iα-chain interaction sites (31). Some cartilaginous fish β2M have potential N-glycosylation sites that are rare in tetrapods but present in several bony fish species (32). Consistent with previous studies (33, 34), phylogenetic tree analysis revealed that cartilaginous fish β2M clustered with the orthologous proteins and to the IgSF domains of MHC class IIA/DMA, suggesting that they share the most recent common ancestor (Fig. 1A). Also consistent with previous studies (33), the IgSF domains of class IIB and class Ia shared the most recent common ancestor. β2M expression pattern seems to coincide with MHC class I expression (Fig. 1B).
A, Phylogenetic tree analysis of β2M. GenBank accession numbers used for this analysis are as follows. β2M: M17987 (human), X69084 (bovine), NM_009735 (mouse), Y00441 (rat), P01885 (rabbit), P01886 (guinea pig), M84767 (chicken), P21612 (turkey), AAM98336 (opossum), BQ389924 (X. tropicalis), AAF37230 (X. laevis), L05536 (carp), NP_571238 (zebrafish), L63534 (trout), CAA10761 (cod), AAG17535 (salmon), CAB61324 (Siberian sturgeon), AAN40738 (Japanese flounder), CAD44965 (African barb), O42197 (catfish), CA330181 (Fugu), AAN62852 (skate), and CX197532 (dogfish). Class IIa: AAF66123 (nurse shark), AAL58430 (X. laevis), AAA59760 (human), AAV40625 (rat), NP_001001762 (chicken), XP_001376764 (opossum). Class IIb: AAF82681 (nurse shark), AAB86437 (human), NP_001008884 (rat), BAA02845 (X. laevis), NP_001038144 (chicken), AAB68822 (opossum). Class Ia: BAD92354 (human), AAC53397 (rat), AAL59857 (nurse shark), NP_001079241 (X. laevis), AAG28835 (chicken), NP_001165308 (opossum). IgM: AAD21191 (opossum), P01871 (human), AAH92586 (rat). DMB: ABB85336 (X. laevis), NP_002109 (human), NP_942035 (rat). DMA: NP_006111 (human), NP_942036 (rat), ACY01474 (chicken), XP_001377359 (opossum). The NJ tree was rooted with the fourth constant IgSF domains of IgM, and bootstrapping analysis was done after 1000 runs. Values are noted at the branch nodes, and the asterisk (*) indicates no significant value. The scale indicates divergence time (genetic distance). Teleost fish that underwent a third round of genome expansion (“3R”) are omitted from this analysis because the sequences were more divergent and skewing the tree topology. DM genes have not been identified in any fish. B, Expression profiles of β2M, class Ia, and ring3 via Northern blotting. Twenty micrograms of total RNA isolated from various nurse shark tissues was loaded onto the gel, blotted, and hybridized with full-length shark β2M and ring3 probes and washed under high-stringency conditions (23). Nucleoside-diphosphate kinase (NDPK) (35) was used as a loading control. C, There is only one β2M locus in the nurse shark genome. Genomic Southern blot analysis was performed under low-stringency conditions (23) using the IgSF exon with three wild sharks (a, b, c) whose DNA was digested with five different restriction enzymes (from left to right: Bam HI, Eco RI, Hin dIII, PST I, and Sac I).
A, Phylogenetic tree analysis of β2M. GenBank accession numbers used for this analysis are as follows. β2M: M17987 (human), X69084 (bovine), NM_009735 (mouse), Y00441 (rat), P01885 (rabbit), P01886 (guinea pig), M84767 (chicken), P21612 (turkey), AAM98336 (opossum), BQ389924 (X. tropicalis), AAF37230 (X. laevis), L05536 (carp), NP_571238 (zebrafish), L63534 (trout), CAA10761 (cod), AAG17535 (salmon), CAB61324 (Siberian sturgeon), AAN40738 (Japanese flounder), CAD44965 (African barb), O42197 (catfish), CA330181 (Fugu), AAN62852 (skate), and CX197532 (dogfish). Class IIa: AAF66123 (nurse shark), AAL58430 (X. laevis), AAA59760 (human), AAV40625 (rat), NP_001001762 (chicken), XP_001376764 (opossum). Class IIb: AAF82681 (nurse shark), AAB86437 (human), NP_001008884 (rat), BAA02845 (X. laevis), NP_001038144 (chicken), AAB68822 (opossum). Class Ia: BAD92354 (human), AAC53397 (rat), AAL59857 (nurse shark), NP_001079241 (X. laevis), AAG28835 (chicken), NP_001165308 (opossum). IgM: AAD21191 (opossum), P01871 (human), AAH92586 (rat). DMB: ABB85336 (X. laevis), NP_002109 (human), NP_942035 (rat). DMA: NP_006111 (human), NP_942036 (rat), ACY01474 (chicken), XP_001377359 (opossum). The NJ tree was rooted with the fourth constant IgSF domains of IgM, and bootstrapping analysis was done after 1000 runs. Values are noted at the branch nodes, and the asterisk (*) indicates no significant value. The scale indicates divergence time (genetic distance). Teleost fish that underwent a third round of genome expansion (“3R”) are omitted from this analysis because the sequences were more divergent and skewing the tree topology. DM genes have not been identified in any fish. B, Expression profiles of β2M, class Ia, and ring3 via Northern blotting. Twenty micrograms of total RNA isolated from various nurse shark tissues was loaded onto the gel, blotted, and hybridized with full-length shark β2M and ring3 probes and washed under high-stringency conditions (23). Nucleoside-diphosphate kinase (NDPK) (35) was used as a loading control. C, There is only one β2M locus in the nurse shark genome. Genomic Southern blot analysis was performed under low-stringency conditions (23) using the IgSF exon with three wild sharks (a, b, c) whose DNA was digested with five different restriction enzymes (from left to right: Bam HI, Eco RI, Hin dIII, PST I, and Sac I).
Mapping of β2M to the MHC in family studies
Two families of nurse sharks previously were used to map several genes to the MHC (21, 36, 37). All of these families showed multiple paternity, at least five fathers in family 1 and seven in family 2. Southern blotting analysis using many restriction enzymes demonstrated that β2M is a single-copy gene (five representative digestions are shown in Fig. 1C); unfortunately, no RFLPs were obtained to test the linkage status, and thus we sequenced the gene from animals with different MHC haplotypes, hoping to find polymorphisms. A two-nucleotide deletion was detected in one of the paternal β2M alleles “p3” from groups “i” (p3/m2) and “j,” (p3/m1) from family 2 with 39 members (Fig. 2A), and allele-specific PCR was performed in all members of the nurse shark families in our collection (Fig. 2B). Family 1 had two positive members that shared the same paternal MHC haplotype (group “h”) (Fig. 2C). In family 2, all seven members of groups “i” and “j” bearing the paternal MHC haplotype “p3” were positive as well as one other offspring belonging to the “e′” group. Family 3 with 29 offspring, which had not been MHC-typed previously, was tested, and two members were positive for the β2M polymorphism (Fig. 2C). Typing of this family by Southern blotting as well as sequencing of the class Ia alleles in all offspring showed that these two animals share the same paternal MHC haplotype (Fig. 2C). Thus, a total of 11 of 12 siblings positive for the β2M polymorphism in three families showed precise cosegregation with certain MHC haplotypes. In addition, 73 of 74 siblings with many other haplotypes lacked this polymorphism, further strongly indicating that β2M does not segregate independently of the MHC. The one discordant animal in family 2 (sib 36, group “e′”) was also typed by microsatellite analysis and shown to have been sired by the same father as offspring in the “i” and “j” groups; thus this father had the MHC haplotypes “p3” and “p6” (E.J. Heist et al., submitted for publication), consistent with a paternal intra-MHC recombination event in sib 36. To quantify formally the evidence for linkage of β2M to the MHC, we considered all offspring of the two deletion-carrying sires (found within families 2 and 3) as assigned by Southern blotting with class I probes (Fig. 2C) (36), sequences of MHC class I alleles (Fig. 2C, Table I), and microsatellite analysis (E.J. Heist et al., submitted for publication) (Table I). Family 1 sires have not been microsatellite-characterized, and therefore family 1 was not included in the analysis. We performed a parametric linkage analysis (26) to evaluate the evidence for β2M and MHC synteny and obtained a maximum LOD score of 3.14 [1378:1 odds of linkage versus no linkage, equivalent to p = 7 × 10−5 (27)] at a θ of 0.056 (Supplemental Table I, Fig. 2D).
The shark β2M is linked to the MHC. A, The two-nucleotide (CC) deletion polymorphism was found in intron 2 of β2m sequences in “p3” paternal allele from siblings belonging to the groups “i” and “j.” Thus, allele-specific primers were designed based on this polymorphism. All primers are underlined. The ends of coding regions are boxed. The (AG) at the end of intron 2 is underlined. B, PCR was carried out with a combination of allele-specific and universal NSB2mEx3Rev reverse primers. Presence or absence of the amplicon using the “p3”-specific primers was used for typing (top gel) the family 2 with 39 offsprings. Maternal primers were used for the positive control (bottom gel). Forward primers are indicated on the left side of the gels, and mother and sibling numbers are indicated above the gel along with MHC groups (36). C, Allele-specific PCR in the families 1 and 3. Only two animals belonging to the MHC groups “h” possessed the “CC-deletion” allele, and two animals belonging to the “g” groups had this allele in family 3. We partially typed family 3 based on the MHC groups by sequencing of the PBR of the class Ia alleles (maternal and paternal alleles are designated as numbers above the gel) and by Southern blotting with a probe containing MHC class Ia leader and α1 domains (small dot, band for maternal haplotype 1; large dot, maternal haplotype 2). The “p2” allele of the “g” group is the only haplotype possessing the “CC-deletion” allele of β2M. D, Plot of LOD scores at corresponding recombination fractions. The sums of the two families were used (Supplemental Table I).
The shark β2M is linked to the MHC. A, The two-nucleotide (CC) deletion polymorphism was found in intron 2 of β2m sequences in “p3” paternal allele from siblings belonging to the groups “i” and “j.” Thus, allele-specific primers were designed based on this polymorphism. All primers are underlined. The ends of coding regions are boxed. The (AG) at the end of intron 2 is underlined. B, PCR was carried out with a combination of allele-specific and universal NSB2mEx3Rev reverse primers. Presence or absence of the amplicon using the “p3”-specific primers was used for typing (top gel) the family 2 with 39 offsprings. Maternal primers were used for the positive control (bottom gel). Forward primers are indicated on the left side of the gels, and mother and sibling numbers are indicated above the gel along with MHC groups (36). C, Allele-specific PCR in the families 1 and 3. Only two animals belonging to the MHC groups “h” possessed the “CC-deletion” allele, and two animals belonging to the “g” groups had this allele in family 3. We partially typed family 3 based on the MHC groups by sequencing of the PBR of the class Ia alleles (maternal and paternal alleles are designated as numbers above the gel) and by Southern blotting with a probe containing MHC class Ia leader and α1 domains (small dot, band for maternal haplotype 1; large dot, maternal haplotype 2). The “p2” allele of the “g” group is the only haplotype possessing the “CC-deletion” allele of β2M. D, Plot of LOD scores at corresponding recombination fractions. The sums of the two families were used (Supplemental Table I).
β2M is adjacent to MHC-linked Ring3
Ring3 (or BRD2) is a putative nuclear transcriptional regulator and a nuclear kinase required for early development (38–41) with no defined immune functions but nevertheless linked to the MHC of all other gnathostomes and to the “proto-MHC” in lower deuterostomes (42). A portion of ring3 was initially cloned via degenerate PCR from nurse shark spleen cDNA, and this short fragment was used as a probe to isolate a full-length cDNA from a phage library. BLAST searches and phylogenetic tree analysis confirmed the orthology of nurse shark ring3 to that of other species (GenBank accession number HM625830) (Fig. 3A). The nurse shark ring3 is ubiquitously expressed (Fig. 1B). To ensure that the shark ring3 is linked to the MHC as in all other species examined (8), we performed ssCP analysis using siblings of family 2 (Fig. 3B). Two distinguishing ring3 bands corresponding with the maternal MHC allele m2 were found in those siblings possessing this allele (groups “i” and “d” in Fig. 3) with 100% fidelity, demonstrating that ring3 is closely linked to the MHC and further confirming the β2M linkage. We identified other BAC clones that were either β2M- or ring3-single-positive; unfortunately, none of them was positive for other MHC genes, again consistent with larger intergenic distances in sharks compared with those of other species (36). Chen et al. (29) drew a premature conclusion of non-MHC linkage; however, determining the linkage status of β2M (or almost any gene) based on a single BAC sequence is not sufficient for the shark genome, where there are large intragenic and intergenic distances. Several nurse shark BAC clones (22) were isolated with the ring3 and β2M probes, and some of them were positive for both genes. As previously reported (29), the β2M gene contains at least three exons, having a similar genomic organization and size to other species. The shark ring3 gene spans ∼20 kb and contains 12 exons, which is approximately twice as large as mammalian ring3 genes (e.g., 12.8 kb and 9.7 kb for human and mouse, respectively), consistent with a larger gene size found in most shark MHC genes (36). Sequencing through an entire BAC clone (GC_614H19) confirmed that the β2M and ring3 genes were adjacent to each other ∼45 kb apart (Fig. 4).
A, Phylogenetic tree analysis of Ring3 and homologues. GenBank accession numbers used in this analysis are as follows. Ring3 (BRD2): CAM25760 (human), AAY34703 (bovine), CAI11405 (dog), CAA15819 (mouse), CAE83937 (rat), XP_001369391 (opossum), CAN13285 (pig), CAA65449 (chicken), BAC82511 (quail), AAI68574 (X. tropicalis), AAI30180 (X. laevis), CAK04960 (zebrafish-1), CAD54663 (zebrafish-2), ABQ59684 (salmon), BAD93258 (medaka). Additional accession numbers for Ring3 homologues used for this analysis are the following: BRD3: AAI29055 (X. laevis), NP_031397 (human), NP_075825 (mouse), XP_001365890 (opossum), XP_425330 (Chicken). BRDT: NP_473395 (mouse), NP_997072 (human), XP_537079 (dog). BRD4: NP_490597 (human), NP_065254 (mouse), NP_001104751 (zebrafish), AAH76786 (X. laevis). BRD1: NP_001157300 (horse), XP_698063 (zebrafish), NP_001085846 (X. laevis), CAG30294 (human). Gene names are noted after species name. BRD1 does not map to an MHC paralogous region, whereas BRDT, BRD3, and BRD4 are found in the MHC paralogous regions. The tree was constructed using the NJ method, rooted with BRD1, and bootstrapping analysis was done with 1000 runs. Values are noted at the branch nodes, and an asterisk (*) indicates no significant value. The scale indicates the divergence time. B, The shark ring3 maps to the MHC. Primers from exons 4 and 5 were used for PCR amplification and ssCP analysis. The ∼1440-bp amplicon from the siblings along with mother shark genomic DNA were loaded on an 0.5× MDE gel. Under these conditions, “m2” was identified as two distinctive bands indicated as arrows. Mother and sibling numbers are indicated above the gel along with MHC groups and haplotype combinations from previous work (36).
A, Phylogenetic tree analysis of Ring3 and homologues. GenBank accession numbers used in this analysis are as follows. Ring3 (BRD2): CAM25760 (human), AAY34703 (bovine), CAI11405 (dog), CAA15819 (mouse), CAE83937 (rat), XP_001369391 (opossum), CAN13285 (pig), CAA65449 (chicken), BAC82511 (quail), AAI68574 (X. tropicalis), AAI30180 (X. laevis), CAK04960 (zebrafish-1), CAD54663 (zebrafish-2), ABQ59684 (salmon), BAD93258 (medaka). Additional accession numbers for Ring3 homologues used for this analysis are the following: BRD3: AAI29055 (X. laevis), NP_031397 (human), NP_075825 (mouse), XP_001365890 (opossum), XP_425330 (Chicken). BRDT: NP_473395 (mouse), NP_997072 (human), XP_537079 (dog). BRD4: NP_490597 (human), NP_065254 (mouse), NP_001104751 (zebrafish), AAH76786 (X. laevis). BRD1: NP_001157300 (horse), XP_698063 (zebrafish), NP_001085846 (X. laevis), CAG30294 (human). Gene names are noted after species name. BRD1 does not map to an MHC paralogous region, whereas BRDT, BRD3, and BRD4 are found in the MHC paralogous regions. The tree was constructed using the NJ method, rooted with BRD1, and bootstrapping analysis was done with 1000 runs. Values are noted at the branch nodes, and an asterisk (*) indicates no significant value. The scale indicates the divergence time. B, The shark ring3 maps to the MHC. Primers from exons 4 and 5 were used for PCR amplification and ssCP analysis. The ∼1440-bp amplicon from the siblings along with mother shark genomic DNA were loaded on an 0.5× MDE gel. Under these conditions, “m2” was identified as two distinctive bands indicated as arrows. Mother and sibling numbers are indicated above the gel along with MHC groups and haplotype combinations from previous work (36).
Map of BAC clone GC_614H19. Gene orientation is indicated as arrows and exons are shown in boxes. Only one exon for ZFP112-like gene was identified based on the similarity to other species. The positions of repetitive elements are shown above the map classified into four different categories. The total interspersed repeats are found in ∼5.35% of the sequences, consisting of ∼4.74% of LINEs and ∼0.63% of simple repeats. Each exon is indicated as a box, and transcriptional orientations are shown with an arrow in the 5′ to 3′ direction. The sequence has been deposited in the DNA Data Bank of Japan under accession number AB571627.
Map of BAC clone GC_614H19. Gene orientation is indicated as arrows and exons are shown in boxes. Only one exon for ZFP112-like gene was identified based on the similarity to other species. The positions of repetitive elements are shown above the map classified into four different categories. The total interspersed repeats are found in ∼5.35% of the sequences, consisting of ∼4.74% of LINEs and ∼0.63% of simple repeats. Each exon is indicated as a box, and transcriptional orientations are shown with an arrow in the 5′ to 3′ direction. The sequence has been deposited in the DNA Data Bank of Japan under accession number AB571627.
Genetic descent of β2M
The chromosomal location of the β2M gene varies greatly among vertebrate species (Fig. 5). Genomic synteny is well conserved in the region of chicken β2M relative to humans except for deletions of certain genes (43), and the same seems to be true for the Anolis lizard in which the synteny near the β2M gene (GenBank accession number FG703784, etc.) is conserved (genomic scaffold-670, 634,364 bp) (Supplemental Table II). Mouse β2M is linked to the so-called minor histocompatibility complex on chromosome 2 (16) and is located within a small region syntenic to human chromosome 15 (43). Notably, a smaller syntenic block is embedded with genes mapping to human chromosome 14q11.2 in a marsupial, the opossum. Although these regions can be accounted for by block translocations or syntenic breakpoints, synteny is not conserved in species from lower vertebrate classes as β2M is surrounded by genes mapping to various human chromosomes. The amphibian Xenopus β2M is linked to the genes mapping to human chromosomes 16 and 17 (genomic scaffold-673). In zebrafish, β2M (chromosome 4) is surrounded by genes mapping to human chromosome 12p12, and various locations in the human genome have syntenic regions on the Fugu scaffold-171 (638,182 bp). As mentioned above, the teleost fish experienced a recent genome-wide duplication (“3R”), and there is another β2M locus in the zebrafish genome that is ∼60% similar to its paralogue at the amino acid level. Notably, the second β2M locus is found at the telomeric region of chromosome 8 and is distantly linked to a class IIA gene and two class Ib genes of the L-lineage (44) (Supplemental Table II). Although the β2M linkage is not very close (i.e., 6.5 Mbp apart) in this chromosomal region (considering the rapid reorganization of syntenic regions in the teleost fish), this linkage group of class II/class I/β2M is likely a vestige of the primordial synteny. Combining all of the evidence, our study in nurse shark demonstrates that β2M was originally encoded in the MHC, and from extensive database analysis in many taxa, this gene underwent multiple translocations in gnathostomes, either stepwise or independently from the MHC (Fig. 5).
Inconsistent synteny of β2M among vertebrate species. Genomic synteny of β2M is not consistent in bony fish and Xenopus, suggesting that multiple translocations of β2M occurred over evolutionary time. An asterisk (*) indicates the location of the β2M gene, and brackets indicate the genomic regions corresponding with the particular human chromosome. The detailed gene assignments can be obtained in Supplemental Table II. IgH and TCRα loci are marked in opossum chromosome 1.
Inconsistent synteny of β2M among vertebrate species. Genomic synteny of β2M is not consistent in bony fish and Xenopus, suggesting that multiple translocations of β2M occurred over evolutionary time. An asterisk (*) indicates the location of the β2M gene, and brackets indicate the genomic regions corresponding with the particular human chromosome. The detailed gene assignments can be obtained in Supplemental Table II. IgH and TCRα loci are marked in opossum chromosome 1.
Discussion
Compared with other vertebrate models (e.g., chicken or teleost fish), the shark genome seems to be stable, first demonstrated with the linkage of MHC class I and II genes (21), which was lost in bony fish (28), and later with linkage conservation of genes found in the mammalian MHC class III region (37). These MHC linkage data are consistent with global genomic studies in the elephant shark suggesting that cartilaginous fish have greater preservation of synteny than is found in any teleost model (45, 46). The β2M linkage to the shark MHC demonstrated here is likely the primordial condition, thus further supporting the conservation of the cartilaginous fish genome. Furthermore, the close proximity of class I, class II, and β2M is consistent with the theory that they were derived from a common ancestor by tandem (cis) duplication. The close linkage of β2M and class I may have regulated their original coordinated expression and upregulation. Class I and β2M expression is nearly identical in the nurse shark (Fig. 1B), but in other vertebrates β2M is made in excess (47). Furthermore, the number of β2M loci is expanded in rainbow trout (48) and polyploid Xenopus species (18).
Unlike class II genes, class I genes are extraordinarily plastic. Besides the MHC-linked classical class Ia genes, there are also many nonclassical class Ib genes with varied functions, some encoded in the MHC and others not. The majority of class Ib proteins associates with β2M as well, and it has been speculated that there was an advantage of translocation of β2M out of the MHC so that it would not be subject to duplications and deletions (19), like class I genes in many vertebrates. Consistent with the idea of maintaining genomic stability, but in contrast to class I and class II genes, both β2M and ring3 genes are in a very stable part of the shark MHC, with very few polymorphisms and transposable elements (Fig. 4); there was no polymorphism detected by using restriction enzymes/Southern blotting with either the ring3 or β2M probe. Although there are a few bony fish species in which the number of β2M loci has been expanded (49), and there are two loci in the tetraploid Xenopus laevis (18), generally these species are exceptions. There seems to be only one β2M locus in the nurse shark genome, because genomic Southern blotting with many restriction enzymes yielded a single band with an exon-specific probe (Fig. 1C).
The primordial linkage of β2M to the MHC does not contribute to the debate on which gene came first, class I or class II. Among the various IgSF domains, the C1-type is a rare form, found primarily in molecules associated with adaptive immunity (50). Therefore, it is reasonable to propose that C1-type IgSF-encoding genes like β2M were present in the “proto-MHC,” which then acquired the PBR from another gene family. Furthermore, it has been speculated that all molecules containing C1-type IgSF domains arose from a common ancestor, and thus an Ig/TCR precursor may have originated from the “proto-MHC” (20). Consistent with previous studies dating back almost 30 y (3, 5, 33, 34), our phylogenetic analysis demonstrated a common origin for the class IIA/DMB/β2M and the class Ia/DMA/class IIB lineages, and all of these genes share an ancestral C1 domain-encoding exon that emerged after the split between Ag receptors and MHC genes (Fig. 1B). Whereas class IIA, β2M, class IIB, and class Ia share an immediate common ancestor that arose by tandem duplication from the ancestral molecule, each DM gene was apparently generated by tandem duplications of class IIA and class IIB, perhaps early after the emergence of tetrapods, as no DM genes have been found in the teleost or cartilaginous fish; the maximum likelihood and Bayesian inference trees favor this scenario (S2). The NJ tree (Fig. 1B), however, suggests that shark class IIA and IIB genes cluster with class II genes from other species rather than at the basal position of class II/DM, suggesting that sharks may indeed possess DM.
An orthologous gene related to the ancestor of ring3 is present in the urochordate (e.g., amphioxus) “proto-MHC” (42), and thus the MHC-linkage of ring3 in sharks is not surprising. To determine the linkage status in other cartilaginous fish species, we examined the elephant shark genome. Current analyses of the elephant shark genome (46) has yielded only short (<1 kbp) scaffolds (AAVX01540028.1) in which we only identified the β2M C1 domain. Three scaffolds were found to contain some exons of the elephant shark ring3 gene [AAVX01538535 (754 bp), AAVX01069837 (5232 bp), AAVX01012433 (4324 bp)]; however, the assembly is still in its early stages. Further progress in this genome project will reveal the synteny around β2M and all of the other MHC genes and likely provide insight into the natural history of the adaptive immune system by revealing other genes that have been translocated out of the MHC during vertebrate evolution. For example, there is good evidence from various vertebrates that both IgSF- and C-type lectin-containing NK cell receptor genes (in humans, they are encoded in leukocyte receptor complex and NK complex, respectively) and the MHC were genetically linked at an early point in vertebrate evolution (20, 51, 52), suggesting that NK receptors co-evolved with MHC proteins. We have found a fragment of a zinc finger protein (ZFP), ZFP112-like, in BAC clone GC_614H19, adjacent to β2M (Fig. 5). ZFP112 is found on human chromosome 19q13.2 near FcRn (19q13.3), a nonclassical class Ib molecule, and the leukocyte receptor complex (19q13.4). This region had been suggested to be an MHC paralogous region by pericentric inversion of 19p13.1. Whether the nurse shark ZNF112 is a pseudogene or divergent from human/rodent ZFP112 genes, the linkage of ZFP112 suggests that the linkage of NK receptor(s) and MHC could be preserved in the shark genome. Furthermore, we found β2M on the same chromosome as TCRα/δ in horse (chromosome 1), cow (chromosome 10), and both TCRα/δ and Ig in the opossum genome (Fig. 5, Supplemental Table II). In addition, Ag receptor loci and other genes involved in immune defense (e.g., B7 ligands and Fc-like receptors) are linked to genes related to the Xenopus MHC (Y. Ohta and M.F. Flajnik, manuscripts in preparation), and cathepsins S and L are found on MHC paralogous regions in mammals (20). Such evidence is consistent with our hypothesis that Ag receptors (TCR, Ig), NK receptors, and other genes involved in Ag processing and generally in immune function might have been linked in a “pre-adaptive immune complex” in the ancestral configuration.
Acknowledgements
We thank Dr. Mike Criscitiello and Caitlin Doremus for critical reading.
Footnotes
This work was supported by National Institutes of Health Grant AI27877 (to Y.O., R.L.L., and M.F.F.) and by Scientific Research on Priority Areas “Comparative Genomics” (20017023) from the Ministry of Education, Culture, Sports, Science and Technology of Japan (to T.S., K.H., S.S., and H.I.). E.H. was supported by the Department of Zoology at Southern Illinois University Carbondale and by the W.W. Diehl Endowed Professorship of Biology to J. Carrier at Albion College.
The sequences presented in this article have been submitted to the DNA Data Bank of Japan under accession number AB571627 and to GenBank under accession numbers HM625830 and HM625831.
The online version of this article contains supplemental material.
References
Disclosures
The authors have no financial conflicts of interest.