Cartilaginous fish, or Chondrichthyes, are the oldest extant vertebrates to possess the MHC and the Ig superfamily–based Ag receptors, the defining genes of the gnathostome adaptive immune system. In this work, we have identified a novel MHC lineage, UEA, a complex multigene nonclassical class I family found in sharks (division Selachii) but not detected in chimaeras (subclass Holocephali) or rays (division Batoidea). This new lineage is distantly related to the previously reported nonclassical class I lineage UCA, which appears to be present only in dogfish sharks (order Squaliformes). UEA lacks conservation of the nine invariant residues in the peptide (ligand)–binding regions (PBR) that bind to the N and C termini of bound peptide in most vertebrate classical class I proteins, which are replaced by relatively hydrophobic residues compared with the classical UAA. In fact, UEA and UCA proteins have the most hydrophobic-predicted PBR of all identified chondrichthyan class I molecules. UEA genes detected in the whale shark and bamboo shark genome projects are MHC linked. Consistent with UEA comprising a very large gene family, we detected weak expression in different tissues of the nurse shark via Northern blotting and RNA sequencing. UEA genes fall into three sublineages with unique characteristics in the PBR. UEA shares structural and genetic features with certain nonclassical class I genes in other vertebrates, such as the highly complex XNC nonclassical class I genes in Xenopus, and we anticipate that each shark gene, or at least each sublineage, will have a unique function, perhaps in bacterial defense.
This article is featured in Top Reads, p.751
The Chondrichthyes or cartilaginous fish (chimaeras, rays, and sharks) are the oldest living animals with the MHC-based adaptive immune system, which is present in all jawed vertebrates or gnathostomes (1, 2). Studies of cartilaginous fish MHC genes have shown them to be multifaceted, with different class I/II lineages and high polymorphism of the classical class I (UAA) and class II genes (2–7). However, this group of early-branching vertebrates also preserves several ancestral features of the immune system that were lost in some or all “higher” vertebrates, such as the cluster-type organization of Ig genes (8), close genetic linkage of MHC class I processing and presenting genes (9, 10), somatic hypermutation of TCR genes (11), and the MHC linkage of β2-microglobulin (β2m) (12).
Classical MHC class I molecules (class Ia) are ubiquitously expressed and bind peptidic Ags in the peptide-binding region (PBR), comprised of the class I α1 and α2 domains, and present the peptides to CD8-positive T cells. Class Ia molecules also have an α3 domain belonging to the C1-type of the Ig superfamily (IgSF), a connecting piece (CP), a transmembrane (TM) domain, and a cytoplasmic tail (CYT) (13, 14). The class I α-chain, made up of three domains, forms a heterodimer with the IgSF C1 member β2m. Nonclassical class I molecules (class Ib), although generally having the same basic structure as class Ia, can be distinguished from class Ia molecules by their low levels of polymorphism, tissue-specific expression, and/or lack of conservation of 9 aa residues that bind to the N and C termini of the bound peptide in the PBR (2, 15–18). Unlike class Ia molecules, class Ib molecules like CD1 and MR1 present nonpeptidic ligands to innate T cells (NKT, MAIT, and γδ cells), and others like FcRn have a “closed” PBR, bind no Ag in the groove at all, and serve other functions besides Ag presentation (19). Class Ib genes can either be MHC linked or map to other chromosomal regions [for example, see (1, 15)].
Four class I lineages have been identified so far and characterized in Chondrichthyes, namely the classical UAA (2) found in all gnathostomes and the nonclassical UBA (2), UCA (4), and UDA (5). Previous work showed that UBA is multigenic and was derived from a UAA ancestor (2, 5), whereas UDA is an ancient (as old as UAA), MHC-linked, single or low-copy lineage found in all cartilaginous fish (5). UCA was detected as a highly divergent class I gene in the spiny dogfish Squalus acanthias (4) but was not studied further.
In this study, we have identified a novel shark-specific nonclassical class I lineage, UEA, distantly related to UCA in the PBR and present in all shark species examined (division Selachii), which emerged ∼350 million y ago (MYA) (20). Unlike UAA and UDA, UEA is found in a large multigene family with three distinct sublineages. In this study, we describe the fundamental characteristics of this new, complex class Ib gene family and speculate on its relationship to the previously described UCA and many other vertebrate class I lineages.
Materials and Methods
Several sequences representative of all MHC class I lineages described in chondrichthyan taxa (sharks, rays, and chimaeras) were used as templates to search for additional MHC class I–like sequences in this taxonomic group. Searches were performed on Sequence Read Archive (SRA), Transcriptome Shotgun Assembly (TSA) and Whole Genome Sequence (WGS) databases publicly accessible on the National Center for Biotechnology Information (http://www.ncbi.nlm.nih.gov, as of November 1, 2019) using the following amino acid sequences as queries: MHC class Ia UAA: nurse shark (Ginglymostoma cirratum), Gici UAA: AAF66110; clearnose skate (Raja eglanteria), Raeg UAA: AGH32767; and elephant shark (Callorhinchus milii), Cami UAA: AFP04519; MHC class Ib UBA: nurse shark, Gici UBA: AAC60347; horn shark (Heterodontus francisci), Hefr UBA: AAC60348; whale shark (Rhincodon typus), Rhty UBA: XP_020392556; MHC class Ib UCA spiny dogfish (S. acanthias), Sqac UCA: AAN78091; MHC class Ib UDA: nurse shark, Gici UDA: MN339476; little skate (Leucoraja erinacea), Leer UDA: LS-transcriptB2-ctg40765 (skatebase.org), and the elephant shark, Cami UDA: AFP03335. Some SRA datasets from RNA sequencing (RNAseq) bioprojects on Selachii taxa were excluded from the analysis when the data referred to species whose genera had multiple datasets and included nonimmunity-related tissues (e.g., retina, pectoral fin bud, or ampullary receptor cells). In the case of Batoid (rays and skates) and Holocephalan taxa (chimaeras and ratfish), all datasets were included in the analysis because of their underrepresentation in public databases. A total of 34 species covering the main chondrichthyan orders were included in the analysis (more details in Supplemental Table I). This comprised 1 species of Holocephalans, 23 species of Selachians, and 9 species of Batoids. All but one order of Batoids (sensu Ref. 21) and five out of the nine orders of Selachians (sensu Ref. 22) are represented.
Depending on the database used (i.e., TSA, SRA, or WGS) (Supplemental Table I), we applied different protocols to extract MHC class I sequences. Briefly, for TSA, we downloaded the available transcripts and used blastx to search for matching sequences against the set of MHC class I queries (see information above). We only retained transcripts with blast e-values less than 0.001 and covering more than 50% of the query sequence. For SRA, raw reads were downloaded, cleaned using Trimmomatic (23), and then assembled using SPAdes (24) to obtain the final transcripts. Yielded transcripts were subjected to the same blastx procedure as described for TSA. For the WGS dataset, a direct search on the contigs for every species was performed using as the query a set of known proteins belonging to the MHC class I (the same query sequences used above) and performing a tblastn to locate the exact positions in the contigs using very restrictive parameters (e-value 10−7). After a manual refinement to avoid overlaps or redundancy, the corresponding gene was extracted from the contigs, including a flanking region of 200 nts, to ensure the whole gene sequence was included. The sequences obtained were translated to amino acids, and only those having basic features of MHC class I structure, namely the PBR (α1 and α2 domains) and the IgSF α3 domain, were further analyzed. To confirm the main characteristics of the MHC molecules, the ScanProsite tool was also used (https://prosite.expasy.org/scanprosite). We excluded from further analyses all of the nearly identical sequences (differing by 3 aa or less) to be conservative and reduce methodological errors.
Sequence alignments and phylogenetic trees
Deduced amino acid sequences were aligned using the ClustalX program in Geneious Prime v2.1 (25) with manual adjustments (representative alignment in (Fig. 1; Supplemental Fig. 1, Supplemental Table II). The neighbor-joining (NJ) phylogenetic trees of the class I peptide-binding domains (α1 and α2) (Fig. 2A) and IgSF (α3 domain) (Fig. 2B) were constructed in MEGA 6.06 (26) using the p-distance model with uniform rates among sites and pairwise deletions for the gaps/missing data. Branch support was evaluated using 1000 bootstrap replicates of the data. Human, chicken, and Chinese alligator CD1 molecules were used as outgroups because they are among the most divergent vertebrate class I lineages (17). To infer the relationships of chondrichthyan MHC class I genes with those of other vertebrates, we also included some representative lineages from other vertebrates (Fig. 2A, Supplemental Table II).
To compare the expression between the new lineage (UEA) and the class Ia UAA, we performed Northern blotting. Ten micrograms of total RNA from various nurse shark tissues (brain, epigonal, gill, gonad, liver, muscle, pancreas, spiral valve, spleen, stomach, thymus, and WBCs) were electrophoresed on denaturing 1% agarose gel. The RNA was then transferred onto nitrocellulose membranes as previously described (2). Hybridization was done using a [32P]-labeled probe encoding the α3 domains of UAA and UEA as well as the loading control nucleoside diphosphate kinase probe (27) under high-stringency conditions (50% formamide, 6× SSC, 5× Denhardt solution, 10 mM EDTA, 5% SDS, and 100 µg/ml sheared salmon sperm DNA) (2). The membrane was air dried and exposed to x-ray film until the desired signal strength was obtained.
Total RNA samples from all tissues mentioned above (except spiral valve because of an insufficient amount of RNA left) were used in transcriptome sequencing (RNAseq) as a complementary and quantitative approach to assess the relative gene expression of the UEA lineage. Tissue-specific mRNA libraries were obtained using poly-A enrichment and sequenced on a NovaSeq platform aiming at 30 million reads per tissue (150PE). Read quality was checked using the FastQC software, but no further filters were needed given the high quality of the data. Raw reads from the 11 libraries were used to assemble the nurse shark transcriptome through SPAdes 3.13 (24) with specific parameters for RNAseq (-rna), using k-mer = 73 as the best k-mer tested. The obtained transcripts were blasted against UEA and UAA sequences, retrieved as indicated in the “Bioinformatic filtering” section, and from G. cirratum and the close relative Rhincodon typus, to select all variants for these genes in the new nurse shark transcriptome.
Gene expression quantification was performed for UEA and UAA selected transcripts (as in the Northern blot) using bioinformatic mapping of the paired reads with RSEM (28). We used α3 domain sequences as reference for the mapping to allow direct comparison with the Northern blot results and because this domain is more conserved within lineages and may serve as a better proxy for lineage-wide expression in multicopy gene lineages such as UEA and UAA.
Southern blots were performed to estimate the presence/absence and number of UEA genes in various chondrichthyan species (including species without genome and transcriptome data, i.e., spotted ratfish Hydrolagus colliei, cownose ray Rhinoptera bonasus, thornback ray R. clavata, and horned shark H. francisci) and to confirm the data obtained with the bioinformatic searches (in species with genome and transcriptome data, i.e., little skate L. erinacea, nurse shark G. cirratum, bamboo shark Chiloscyllium punctatum, sand-tiger shark Carcharias taurus, lesser spotted dogfish Scyliorhinus canicula, lemon shark Negaprion brevirostris, spiny dogfish S. acanthias, and blue shark Prionace glauca). Ten micrograms of genomic DNA (gDNA) extracted from shark erythrocytes were digested with BamHI for 48 h, followed by electrophoresis in a 0.8% agarose gel. The digested gDNA was transferred onto a nitrocellulose membrane via capillary transfer, and a [32P]-labeled probe encoding the α3 domain of nurse shark UEA was used for hybridization under low-stringency conditions (30% formamide, 6× SSC, 5× Denhardt solution, 0.5% SDS, and 100 µg/ml sheared salmon sperm DNA) (2). The membrane was exposed to x-ray film for different periods to obtain the optimal signal strength.
Identification of the UEA lineage in all shark species tested (Selachii) and its general biochemical properties
Taking advantage of the new wealth of data available for transcriptomes and genomes in cartilaginous fish (Supplemental Table I), we searched for MHC class I genes in an extended taxonomic representation of sharks, rays, and chimaeras. Our searches (representative sequences in (Fig. 1; the entire list of sequences in Supplemental Table II, Supplemental Fig. 1; and available at https://doi.org/10.6084/m9.figshare.14932545.v2) yielded a novel group of sequences clearly distinct from the previously reported cartilaginous fish MHC class I lineages. These sequences were found in all shark species with transcriptome and/or genome data but not in any of the chimaera and ray taxa studied. Following MHC nomenclature for nonmammalian vertebrates (29) and taking into consideration the previously reported cartilaginous fish lineages, we christened this new lineage UEA.
The nine invariant residues that bind to the N and C termini of bound peptides in almost all classical class I PBR are not conserved in UEA sequences (Table I), one of the features that defines UEA as a nonclassical molecule (16, 18). The UEA α3 domain sequences have the typical disulfide bridge found in nearly all IgSF domains (Fig. 1; Supplemental Fig. 1). Of the two conserved salt bridges in the α1 (H3 and D29) and α2 domains (H93 and D122) found in most class I molecules (including all lineages in chondrichthyan fish), only the UEA α2 domain bridge is present. Unexpectedly, the α2 domain disulfide bridge present in all other vertebrate class I molecules (and β1 domain of class II), an important element for its secondary and tertiary structure (30), is absent in this new lineage. The canonical α2 domain cysteines are replaced with hydrophobic residues in UEA, similar to the few cases in which the disulfide bond is lost in IgSF domains (13). All UEA sequences have the typical class I Asn-linked glycosylation site at the C terminus of the α1 domain (N87), an Asp/Glu-containing CP, and a hydrophobic TM region (Fig. 1; Supplemental Fig. 1). A second putative Asn-linked glycosylation site (N238) is detected in some UEA IgSF domains between the C and D β strands, and there is an insertion of 5 aa between the A and B IgSF β strands. The length of the CYT in different UEA sequences (Fig. 1) varies greatly when compared with the conserved size of the UAA tail. Although the alignment of the CYT could not be achieved because of its high sequence divergence, the typical UAA tyrosine (Y320), which regulates endocytosis and sorting (31), is not present in most of the UEA sequences. Additionally, the serine content of the CYT is lower than in the other cartilaginous fish MHC class I lineages. Considering the residues that interact with CD8 (positions 222, 223, 224, 225, 241; and 243; (Fig. 1; for examples, see Refs. 17, 32, and 33), most are conservatively changed (i.e., different amino acids but with the same properties: charge and polarity) with the exception of E225 (Q/L in UEA). The same conservation is found for the 30 residues that interact with β2m (Fig. 1) (32, 34). Overall, the UEA PBR has less than 30% amino acid identity (on average) to the PBRs of UAA, UBA, UCA, and UDA, and the UEA IgSF domain has 35–41% amino acid identity to the IgSF domains of these four lineages (Table II).
|Between Lineages .|
|.||UAA .||UBA .||UCA .||UDA .|
|PBR (α1 + α2)|
|PBR + IgSF (α1 + α2 + α3)|
|Within Lineage (UEA)a|
|Between Lineages .|
|.||UAA .||UBA .||UCA .||UDA .|
|PBR (α1 + α2)|
|PBR + IgSF (α1 + α2 + α3)|
|Within Lineage (UEA)a|
Sequence information listed in Supplemental Table II.
UEA falls into three distinct sublineages
As has been observed in other classical and nonclassical class I lineages (32), UEA is more diverse in the PBR α1 compared with the α2 domains, and the IgSF is extremely well conserved (average amino acid identity: 48.8% for α1, 69.3% for α2, and 86.4% for IgSF) (note the relative size of the branch lengths in the phylogenetic tree in (Fig. 2; Table II). UEA can be clearly subdivided into three sublineages, which we designated as UEA-A, UEA-B, and UEA-C, considering the level of amino acid sequence identity in the PBR (Table II) and the residues present in each subgroup (Fig. 1, Supplemental Fig. 1). The sublineage UEA-C (average 56.0% identity) is the most diverse in the α1 domain when compared with UEA-A (average 75.2% identity) and UEA-B (average 75.9% identity). In contrast, the α2 and IgSF domain show similar levels of amino acid sequence identity among sublineages (average values for UEA-A, -B, and -C: 72.9%, 79.4%, and 71.5% for the α2 domain and 86.4%, 87.3%, and 89.2% for the IgSF domain, respectively). Overall, the amino acid alignment (Fig. 1) shows that sublineages UEA-A and -B are distinguished by many “fixed” positions (overall) when compared with the UEA-C sublineage, which is equally similar to both UEA-A and UEA-B. This characteristic is clearly visible when the phylogenetic trees are separated by domains (see (Fig. 2, Supplemental Fig. 2, Discussion). Although it is possible to detect some distinctive positions (e.g., position 59 in which UEA-B and C have Tyr, His, or Phe, whereas UEA-A have Asn or Asp; position 70 in which UEA-A and -C share the Lys or Arg residues, whereas lineage B has Leu or His residues), the majority of the sequences is highly variable (Fig. 1, Supplemental Fig. 1). There are some conserved residues in the different sublineages that potentially could bind to the TCR (T in (Fig. 1, based on Ref. 35). For example, three residues in the α2 domain of UEA-A at 161–163 (all Glu) are well conserved within the sublineage and could interact with a conserved region of TCRs.
Evolutionary relationship among MHC class I lineages, sublineages, and domains
To understand the evolutionary relationship among all chondrichthyan MHC class I amino acid sequences, NJ phylogenetic trees (Fig. 2) were constructed using either the PBR (α1 and α2 domains, Fig. 2A) or the IgSF (α3 domains, Fig. 2B), rooted with the divergent nonclassical class I molecule CD1. In both trees, UEA formed a single, well-supported clade (100% bootstrap value). Unlike UBA, UEA is clearly not derived from any class I lineage (Fig. 2) but has an old, divergent relationship with UCA in the PBR (Fig. 2A), especially in the α2 domain (Supplemental Fig. 2B).
Three UEA sublineages are clearly distinguished in the PBR tree. UEA-A and -B are each well-supported sublineages exhibiting some fixed positions (overall) when compared with the UEA-C sublineage. The UEA-C appears as a sister group of both UEA-A and UEA-B, which are somewhat more closely related. Domain-wise, the sublineages group precisely in the α1 domain tree, whereas some sequences are mixed in the α2 tree. Note that the UEA-A and -B sublineages have fewer sequences than UEA-C (11, 15, and 37 sequences per sublineage UEA-A, -B, and -C, respectively; Supplemental Table II), but they share a similar taxonomic representation, at least when comparing UEA-B and UEA-C (7, 12, and 13 species per sublineage UEA-A, -B, and -C, respectively). In the cartilaginous fish IgSF α3 domain tree (Fig. 2B), the lineages of class I genes within the cartilaginous fish were generally well supported but not the distinction of the UEA sublineages. Rather, the IgSF UEA sequences group according to taxon (species, family, or order) regardless of sublineage, suggesting within-species homogenization of this exon among the three ancient UEA sublineages, perhaps via gene conversion [concerted evolution (36)]. Indeed, the percentage of identity of IgSF nucleotide sequences among sublineages at the within-species level range from 88 to 100% (average range: 92–100%). This feature of differential evolutionary histories of the PBR and IgSF domain has been documented previously for many vertebrate class I families (see Discussion).
UEA is an MHC-linked, multigenic, likely nonpolymorphic lineage
We tested for UEA’s presence and gene number in different cartilaginous fish species via Southern blotting under low-stringency conditions with a UEA probe encoding the conserved IgSF α3 domain (“Chondroblot” in (Fig. 3). Consistent with our bioinformatic searches, UEA genes were found in all shark species tested (division Selachii) but not detected in Holocephalans (represented by ratfish H. colliei) or Batoids (represented by cownose ray R. bonasus, little skate L. erinacea, thornback ray R. clavata, and marbled electric ray Torpedo marmorata). Also consistent with multiple sequences retrieved per species with the bioinformatic searches, we detected a large number of bands in all shark species, indicative of a large multigene family.
As we have done in several other studies to examine MHC linkage in nurse shark families (5, 7, 37), we attempted to uncover restriction fragment length polymorphisms (RFLP) to use for segregation analyses (Fig. 4). Unfortunately, despite the multigenic nature of UEA, five different restriction enzymes (BamHI, EcoRI, HaeIII, MindIII, and RsaI) yielded no RFLP, suggesting that this family is relatively nonpolymorphic. Bioinformatic searches using the genome of the whale shark (Rhincodon typus) (38) revealed at least 17 UEA genes located on 13 scaffolds (NW_018043287; NW_018046740; NW_018051575; NW_018034418; NW_018030408; NW_018035489; NW_0180603390; NW_018029627; NW_018034418; NW_018052231; NW_018052231; NW_018046740; and NW_018051575). One gene model (XM_020510578) in the sublineage UEA-C is in the same scaffold (NW_018030408) as the UDA gene (XM_020510579). Thus, based on our previous work showing linkage between UDA and UAA genes in nurse sharks (5), it is highly likely that UEA-C genes are also MHC linked. In two other scaffolds (NW_018060336 and NW_018034418), we found three and two UEA-C genes, respectively, suggesting that the genes belonging to the sublineage UEA-C are distributed in tandem across the MHC. The search performed in the whitespotted bamboo shark (C. plagiosum) genome revealed the presence of MHC class I genes on chromosome 37 (accession number CM012992) as previously reported by Zhang et al. (39). On this chromosome, we found two UAA loci, one UDA locus, and six UEA loci (four UEA-B and two UEA-C) (data not shown). The gene identification was performed with the α1 domain because it allows the best identification of the sublineages (Supplemental Fig. 2). Unfortunately, in the remaining species (Supplemental Table I), the available genome assemblies remain highly fragmented. In summary, the data support MHC linkage of at least some of the UEA genes.
Expression of the multicopy UEA lineage
The expression of the UEA lineage was analyzed in different nurse shark tissues and compared with the classical MHC class I UAA via Northern blotting (Fig. 5) and RNAseq (Fig. 6). A UEA IgSF α3 probe was used to detect the expression of all sublineages simultaneously because this domain is the most conserved among all sublineages (Table III). Distinctive banding patterns were detected in each tissue, consistent with the expression of multiple genes in this large class I family, in addition to tissue-specific expression patterns (Fig. 5). A band at ∼1.9 kbp was detected in several tissues, which may be a major transcript or group of transcripts. The Northern blot also shows other bands between 3.6 and 5.0 kbp in the epigonal and thymus and above 6.6 kbp in epigonal, spleen, and WBC.
|PBR .||UAA .||UBA .||UCA .||UDA .||UEA .||UEA-A .||UEA-B .||UEA-C .||CD1 .|
|192 aa (α1 and α2)||42.1||48.6||45.0||38.8||42.6||43.8||41.7||42.3||50.1|
|Number of sequencesc||3||3||2||3||63||11||15||37||3|
|PBR .||UAA .||UBA .||UCA .||UDA .||UEA .||UEA-A .||UEA-B .||UEA-C .||CD1 .|
|192 aa (α1 and α2)||42.1||48.6||45.0||38.8||42.6||43.8||41.7||42.3||50.1|
|Number of sequencesc||3||3||2||3||63||11||15||37||3|
The CD1 lineage is used for comparison because it is the most studied molecule with a hydrophobicity groove (61).
The nine invariant residues that bind to the N and C termini of bound peptides found in the classical class I.
The previous nine invariant residues and the 28 other predicted peptide-binding residues.
The RNAseq data corroborated the Northern blot results, retrieving multiple UEA transcripts (n = 101) with lengths ranging between 1111 and 7688 bp. Transcript size showed a bimodal distribution at 1.5–2.0 kbp and 3.0–5.0 kbp (Fig. 6A), consistent with the Northern blot. Transcript size variation was due to a combination of variable intron sizes retained in many of the transcripts and to variable 5' and 3' untranslated regions. Expression levels across tissues based on unique α3 query sequences (available at https://doi.org/10.6084/m9.figshare.14932545.v2) was higher for the single copy UAA compared with the multicopy UEA (up to 10× higher in some tissues), with the former exhibiting marked variability among tissues, whereas the latter showed more even expression levels (Fig. 6B). Overall, UAA and UEA had the highest expression in gill and stomach and the lowest in muscle and pancreas (Figs. 5, 6B).
UCA is distantly related to UEA and is detected only in order Squaliformes
The UCA lineage was identified in 2003 by Wang et al. (4) in the spiny dogfish S. acanthias as a highly divergent class I gene. Despite our targeted searches for UCA sequences in all chondrichthyan species available in the genome and transcriptome datasets, we only detected UCA in another squaloid shark species, the velvet-belly lanternshark Etmopterus spinax. Both species are Squaliform sharks, and thus, the extant UCA lineage appears to be exclusive to this order of cartilaginous fish (Supplemental Table I). UCA is likely represented by a single locus because all sequences we detected were identical; of 84 MHC class I sequences from the spiny dogfish transcriptomic data, six were UCA with 100% identity. Only a single sequence of UCA was detected in the velvet-belly lanternshark transcriptome from a total of eight MHC class I sequences. Phylogenetic trees of the PBR show that UCA and UEA indeed share an early common ancestor (Fig. 2, Supplemental Fig. 2), but both lineages are found in the Squaliform sharks; note that the lack of the canonical disulfide bridge in the α2 domain is shared by UCA and UEA. The potential phylogenetic scenario for emergence and maintenance of the UCA and UEA lineages is addressed in the Discussion (Fig. 7). Interestingly, UCA shows the highest level of hydrophobicity for chondrichthyan MHC class I lineages in the conserved “peptide”-binding sites, comparable to the highly hydrophobic groove of the CD1 lineage (Table III). Taking in consideration the nine UAA invariant residues that bind to the N (light gray) and C termini (dark gray) of bound peptides, the percentage of hydrophobic residues range from 22.2% (UDA) to 66.7% (UCA) and 80.8% (CD1). All potential binding residues (37 aa) have the percentage of hydrophobic residues ranging from 35.1% (UDA) to 71.6% (UCA) and 69.1% (CD1). These results contrast with the percentage of hydrophobic residues obtained for the entire PBR (∼180 residues), in which all lineages are similar (38.8% UDA, 48.6% UCA, and 50% CD1).
Unique features of the UEA class I family
This new MHC class I lineage, UEA, is present in all shark species (Selachii) but not in rays (Batoidea) or chimaeras (Holocephali). UEA has most of the typical features of MHC class I lineages such as leader peptide, α1, α2, and α3 domains, and CP/TM/CYT regions. These molecules also show the N-glycosylation site in the α1 domain, the salt bridge in the α2 domain, and the disulfide bridge in the α3 domain. However, at least two important elements, the salt bridge in the α1 domain and, especially, the disulfide bridge in the α2 domain, are not conserved in UEA (Fig. 1, Supplemental Fig. 1). The absence of the salt bridge was also reported for CD1 and PROCR (17), lineages that diverged from classical class I at least 350 MYA. However, CD1/PROCR also lost the second salt bridge in the α2 domain, and they have preserved the disulfide bridge in the α2 domain (17). This disulfide bridge, formed between the thiol groups in two α2 domain Cys residues, is important for the secondary and tertiary structure of class I proteins (30). This feature has been lost in both UEA and UCA, so far the only two MHC class I lineages or genes with this condition. Further studies are needed to predict the consequences of this absence on the structure of the protein, but disruption of this disulfide bridge in HLA-A class I proteins results in a delay in the maturation of the H chain during biosynthesis, decrease in the total amount of mature H chains within the cells, and lower levels of class I surface expression (40). A novel potential N-glycosylation site is found in some UEA sequences in the α3 domain, mostly from nurse shark and spiny dogfish (Fig. 1, Supplemental Fig. 1). The role of this potential N-glycosylation site is unknown. However, the well-conserved Asn78 N-linked glycan is frequently associated with class I protein structure and quality control events in the endoplasmic reticulum (41–43). One of the features that remains unclear is UEA’s associations with β2m, CD8, and/or TCR, like the classical class I molecules (Fig. 1, Supplemental Fig. 1); although most of the potential interacting residues are conservatively changed (maintaining the basic amino acid properties), there are several residues that completely change the charge or polarity, making the association with these molecules unclear.
As is seen both by the presence of multiple sequences from the same species, and the Southern blotting data (Fig. 3), the UEA lineage is a large, multigene family. UEA is subdivided in three sublineages, UEA-A, -B, and -C, based on differences in the PBR (Figs. 1, 2A, Supplemental Figs. 1, 2). The three sublineages were found in transcriptome datasets from different shark taxa, and thus, all are expressed. The UEA-C sublineage has the most divergent sequences in the α1 domain and the most conserved sequences in the IgSF α3 domain (Table II). UEA-A and -B are each well-supported sublineages exhibiting some fixed positions (overall) when compared with the UEA-C sublineage. Although bootstrap support values are not high, the UEA-C appears as a sister group to another cluster bearing UEA-A and UEA-B as closer relatives. Preliminary data from Southern blotting in nurse sharks suggests that the UEA family is relatively nonpolymorphic, as we detected no UEA RFLP (Fig. 4), whereas UAA RFLP can be readily detected (7). Low polymorphism is generally expected for nonclassical class I genes, and this will be a topic of future studies.
Data mined from the emerging cartilaginous fish genome projects show that at least some of the UEA genes are MHC linked. UEA genes were found in tandem in the genomes of whale shark (38) and whitespotted bamboo shark (Y. Zhang, H. Gao, H. Li, J. Guo, M. Wang, Q. Xu, J. Wang, M. Lv, X. Guo, Q. Liu, et al, manuscript posted on bioRxiv), and thus, it is likely that homogenization of the IgSF α3 domain exon may occur via gene conversion of these closely linked genes. With such a complex gene family, it is likely that numerous gene conversions and recombinational events take place within this lineage. Gene segmental exchange has been widely reported for MHC genes in other species (44–46), and tandem class I gene organization is common in gnathostomes (32, 47–50) (see below).
Relationship between UCA and UEA
The PBR phylogenetic tree suggests that both UCA and UEA are independent and evolutionary old lineages not derived from any previously reported class I lineage, unlike the UBA lineage that is clearly derived from UAA (Fig. 2, Supplemental Fig. 2). In turn, UEA and UCA are ancestrally related in the PBR, as seen in the phylogenetic trees, and both lineages lack the canonical disulfide bridge in the α2 domain (Fig. 1, Supplemental Fig. 1). However, the exon encoding the UCA and UEA IgSF domains is not homogenized, suggesting a long and independent evolutionary history (Fig. 2). Based on the above features and the presence of both lineages only in Selachii, we hypothesize two evolutionary scenarios (Fig. 7B): 1) UEA emerged in the ancestor of sharks at the same time as UCA, but UCA was lost in superorder Galeomorphii; or 2) UCA emerged from a UEA ancestor within superorder Squalomorphii or possibly within only one of its orders (i.e., Squaliformes). The UCA gene was (so far) only found in two species of the order Squaliformes that are also the only representatives of the superorder Squalomorphii in our dataset; thus, UCA is found in one order but maybe is present in all Squalomorphii orders. In scenario 1, both UEA and UCA would have originated ∼350 MYA in the ancestor of sharks, whereas in scenario 2, UCA would be not older than ∼327 million y (20) (Fig. 7A). UCA’s long divergence time could explain its distinctive features from UEA, in addition to a possibly different function (and selective pressure) consistent with its highly hydrophobic binding site. It should be reiterated that the squaloid sharks have retained both the UCA and UEA lineages, suggesting that the UEA sublineages emerged independently and not from UCA.
Where does UEA fit into the big class I picture?
The main features of UEA lineage are like certain nonclassical class I genes in other vertebrates, namely the XNC lineage in amphibian Xenopus (32, 51, 52), nonclassical class I genes in the amphibian Ambystoma (axolotl) (49), and UT in marsupials and monotremes (50). The XNC is a highly complex class I family with several sublineages, much like UEA. XNC and UEA genes have very high variability in PBR, with the α1 domain being more diverse and gene specific than the α2 domain. Only one residue of the nine conserved classical class I peptide-binding residues that bind the N and C termini of bound peptides are conserved: Y171 for Xenopus XNC and W174 for UEA. The α3 IgSF domains are very similar among the different members of the family for both XNC and UEA, suggestive of concerted evolution by gene conversion. Such homogenization of the exon encoding the α3 domain has been widely reported for other vertebrate species from humans to fishes (53–57), and the XNC and UEA data typify the fundamentally differential selection pressures on the PBR (even differences between the α1 and α2 domains) and the IgSF domains. The XNC genes are minimally polymorphic (32), and our preliminary data suggest the UEA genes also have limited polymorphism.
Note that there are also differences between XNC and UEA. Although the gene conversion of the α3 domain is specific to UEA and does not extend to any of the other shark class I lineages (even the related UCA), there seems to be concerted evolution of XNC with the classical class I gene (32), which is a half chromosome away from XNC (58), and even with Xenopus class I genes on other chromosomes (59). It will be important to perform fine mapping of the cartilaginous fish class I genes on the MHC chromosomal region; during the 400 million y since the emergence of the five class I lineages, interesting chromosomal “blocks” may have arisen to safeguard the integrity of particular lineages.
As expected for a multigene family, UEA shows a very complicated expression pattern (Figs. 4, 5), much like the XNC genes. In addition to the expression of multiple transcripts of different sizes consistently shown by the Northern blot and RNAseq data, the Northern blot also suggests a tissue-specific expression with the presence of different major bands in various tissues. This pattern is in part supported by the RNAseq data; we found that only half of the UEA α3 transcripts were expressed in seven or more tissues (data not shown). Likewise, different XNC genes or sublineages have a tissue-specific expression (51), and two XNC genes have been shown to activate NKT cells (52) for defense against tumors and intracellular bacteria. NKT cells have been best studied in mammals for their recognition of the CD1 and MR1 nonclassical class I molecules and also for recognition of tumors and bacteria. During T cell development, mammalian NKT cells are not positively selected by the thymic epithelium but rather by nonclassical class I molecules expressed by cortical thymocytes (19), and indeed, many of the XNC genes are expressed by cortical thymocytes (60). UEA shows a complex expression pattern in the thymus (Fig. 5), and we speculate, based on precedent, low levels of polymorphism and its relatively hydrophobic binding site, that UEA (and UCA) functions to stimulate innate T cells for antibacterial defense. The elasmobranch UBA lineage, also multicopy but with a relatively hydrophilic PBR, is also a candidate for ligands stimulating NKT cells (2).
Regardless of UEA’s function, the discovery of this multigenic class I family, in combination with the other four chondrichthyan class I lineages, further reveals the complexity of adaptive, and likely innate, immunity in the cartilaginous fish. This work demonstrates a previously unrecognized “Big Bang” of class I lineages at the origins of gnathostome adaptive immunity as no MHC molecules have been detected in jawless fish (61). Three other class I lineages are present in the cartilaginous fish (Almeida et al., manuscript in preparation), suggesting that when class I genes first emerged in the vertebrates, there was a rapid emergence of lineages, more complex than in most gnathostome classes. Delving into the functions of these lineages may reveal primordial functions of the class I/T cell system.
This work was supported by Portuguese Foundation for Science and Technology Award PD/BD/114542/2016 (T.A.), Contract IF/00376/2015 (P.J.E.), Contract DL57/2016 (A.V.), and Project Grant PTDC/ASP-PES/28053/2017, FEDER Funds through Operational Competitiveness Factors Program-COMPETE Contract POCI-01-0145-FEDER-022184 (A.M.M.), and National Institutes of Health Grants AI140326-26 (Y.O.) and AI02877 (M.F.F.).
The online version of this article contains supplemental material.
Abbreviations used in this article
- class Ia
classical MHC class I molecule
- class Ib
nonclassical class I molecule
million y ago
restriction fragment length polymorphism
Sequence Read Archive
Transcriptome Shotgun Assembly
Whole Genome Sequence
The authors have no financial conflicts of interest.