Ab “ultralong” third H chain complementarity-determining regions (CDR H3) appear unique to bovine Abs and may enable binding to difficult epitopes that shorter CDR H3 regions cannot easily access. Diversity is concentrated in the “knob” domain of the CDR H3, which is encoded by the DH gene segment and sits atop a β-ribbon “stalk” that protrudes far from the Ab surface. Knob region cysteine content is quite diverse in terms of total number of cysteines, sequence position, and disulfide bond pattern formation. We investigated the role of germline cysteines in production of a diverse CDR H3 structural repertoire. The relationship between DH polymorphisms and deletions relative to germline at the nucleotide level, as well as diversity in cysteine and disulfide bond content at the structural level, was ascertained. Structural diversity is formed through (1) DH polymorphisms with altered cysteine positions, (2) DH deletions, and (3) new cysteines that arise through somatic hypermutation that form new, unique disulfide bonds to alter the knob structure. Thus, a combination of mechanisms at both the germline and somatic immunogenetic levels results in diversity in knob region cysteine content, contributing to remarkable complexity in knob region disulfide patterns, loops, and Ag binding surface.
The structural diversity of Abs, which for nearly all species occurs within the Ig scaffold, enables the vertebrate adaptive immune system to neutralize a myriad of foreign Ags. The H chain third H chain complementarity-determining region (CDR H3) is encoded by rearranged variable (VH), diversity (DH), and joining (JH) gene segments and is the most diverse part of the Ab molecule (1–5). The length of the CDR H3 loop varies in different species and is usually important in Ag binding (6). The Ag binding site (paratope) of most human Abs is flat or undulating, with CDR H3s typically ranging from 8 to 16 aa in length (5, 7, 8). The ability to bind certain classes of Ags, including viral spike and other glycoproteins, has been attributed to longer and protruding CDR H3 structures on human Abs (9–14). Bovine Abs have much longer CDR H3 regions than any other species examined, with a typical average length of more than 23 aa. Remarkably, ∼10% of Bos taurus Abs, in all isotypes, have exceptionally long CDR H3 regions that are 40 to 70 aa in length (1, 5, 15–18). These ultralong CDR H3s may enable binding to concave or cryptic epitopes that Ag binding sites with shorter CDR H3s cannot easily access. Cows are the only vertebrates studied that can mount a rapid, broadly neutralizing response against the engineered HIV gp140 SOSIP Env timer, and several mAbs with broadly neutralizing activity had ultralong CDR H3s (19). The ultralong CDR H3 of one such Ab, NC-Cow1, has a unique ability to navigate through the glycan shield of the HIV spike to reach the cryptic CD4 binding site (16, 19, 20). Cows, therefore, have an unusual humoral immune system characterized by Abs with ultralong CDR H3 regions that may have unique protective properties against certain Ags.
Compared with other species, the bovine Ab repertoire is limited in terms of the number of germline genetic components and thus has a lower potential for combinatorial diversity (5, 15, 21–23). The functional gene segments of the bovine Ab repertoire appear to include only 12 VH, 23 DH, and 2 JH at the H chain locus; 25 Vλ and 3 Jλ at the λ locus; and 8 Vκ and 3 Jκ at the κ locus (5, 21, 24, 25). There is even less potential for combinatorial diversity in bovine H chains with ultralong CDR H3s, because all Abs in this subset appear to use the same germline VH, DH, and JH gene segments: IGHV1-7, IGHD8-2, and IGHJ2-4 (15, 16, 21–24). The single very long DH gene segment has two polymorphic alleles: IGHD8-2*01 and IGHD8-2*02 (22, 25, 26). The frequencies of these alleles in the bovine population and their relevance to ultralong CDR H3 diversity has not been explored. Thus, there are severe limitations in germline potential diversity for ultralong CDR H3 Abs; however, polymorphic DH regions could contribute to diversity across the B. taurus population or specifically in heterozygote animals.
Conventionally, Ab repertoire diversity is generated through V(D)J recombination (including junctional insertions and/or deletions) before Ag exposure and somatic hypermutation (SH) after Ag exposure (3, 4, 15, 27). The limited potential for combinatorial diversity of bovine Abs, particularly the subset with ultralong CDR H3s, suggests that diversity of the bovine Ab repertoire is primarily achieved through SH and possibly additional unknown mechanisms. Several species, including cows, however, activate SH before Ag exposure, thus allowing further diversification of the primary repertoire (28, 29). Unlike Abs from other species (and bovine Abs with shorter CDR H3s), mature ultralong CDR H3 Abs have low amino acid variability in the CDR H1 and CDR H2 regions (15, 21). Diversity in the bovine ultralong Ab repertoire is concentrated in the CDR H3 region, which is largely encoded by the DH gene (15, 21). Studies of the knob domain have suggested that the CDR H3 may be the only CDR used for Ag recognition by bovine Abs with ultralong CDR H3s (19, 20, 30–32). This suggests that the CDR H1, H2, L1, L2, and L3 of ultralong Abs may not interact with Ag and may function primarily to support and stabilize the CDR H3. Indeed, CDR H3s can be transplanted between Abs and retain function (19), and knobs can be cleaved from the Ab and still bind Ag (32). Thus, both the binding and diversity properties of ultralong CDR H3 Abs appear to reside within the ultralong CDR H3, as opposed to the rest of the molecule.
Crystal structures have revealed a conserved structural paradigm for bovine ultralong CDR H3s; they are composed of a β-ribbon “stalk” upon which sits a disulfide-bonded “knob” minidomain (1, 5, 16, 33). The extended “stalk” protrudes from the typical Ab surface and is formed by two antiparallel β-strands, which can vary in length. The “knob” domain contains disulfide bonded cysteine residues and three short antiparallel β-strands at its core (1, 5, 16). A CTTVHQ motif encoded by the 3′ end of IGHV1-7 initiates the ascending β-strand of the stalk, with junctional diversity residues composing the rest of the ascending β-strand (5, 16, 21). The DH gene segment encodes the knob region and a portion of the stalk’s descending β-strand, which usually contains alternating stacking aromatic residues. The C-terminal end of the descending β-strand is encoded by IGHJ2-4 (5, 16, 21). The underlying genetic features encoding the structural architecture are established, but diversity-generating mechanisms that alter cysteine diversity and therefore the disulfide bond patterns of the knob have yet to be deciphered in detail.
The CDR H3 knob regions are quite diverse in amino acid sequence, shape, orientation, and disulfide patterns (1, 5, 16). Although the presence of three very short antiparallel β-strands appears conserved in knob regions based on available crystal structures, there is dramatic variability in amino acid sequence as well as overall sequence length (1, 5, 16). Knob regions typically have an even number of cysteines, and the sequence position of the first cysteine is nearly universally conserved (1, 5, 16). Otherwise, knob region cysteines are diverse in terms of their positions and participation in disulfide bond patterns (1, 5, 16). The IGHD8-2*01 sequence contains 19 RGYW/WRCY SH hotspots, recognized by activation-induced cytidine deaminase (AID), and 38 of 48 aa codons can mutate to a cysteine with a single-nucleotide point mutation (15). IGHD8-2*01 was observed to have a high frequency of internal deletions that altered cysteine content, with in-frame deletions surviving clonal selection (21). With the high density of RGYW/WRCY motifs in the DH region, more than 96% of these deletions overlapped an AID hotspot (21). These data suggest that the ability to genetically alter cysteine content and positions would have significant impact on the knob structure by altering disulfide-bonded loops.
A key mechanism for generating knob region structural diversity appears to involve changes in cysteine position and disulfide bond pattern that arise through SH by means of point mutations to form cysteine codons and internal deletions that alter cysteine content or position (1, 16, 21). In this regard, the relationship between cysteine location, relative to the germline at the sequence level, and disulfide bond pattern on the structural level has not been fully investigated. In addition, differences in knob structure resulting from the use of each DH allele have not been examined. Furthermore, the impact of nucleotide deletions on ultralong CDR H3 structures, and particularly cysteine position, is unknown. A more thorough understanding of how changes in ultralong DH region cysteine content at the sequence level result in certain disulfide bonding patterns will be useful for understanding repertoire generation and development as well as for Ab engineering applications, such as knob region rational design or knob peptide molecular evolution. We analyzed the sequences and crystal structures of seven ultralong CDR H3s to determine germline DH allele identity and the location of nucleotide deletions (where applicable), and we compared disulfide bond patterns and DH region cysteine locations relative to germline cysteines. We sought to identify conserved cysteines and disulfide bond patterns on both sequence and structural levels. Our analysis revealed that the DH germline allele of five of the Abs is IGHD8-2*02, including three Abs with DH deletions, and IGHD8-2*01 for two of the Abs. We found that allele use and DH deletions, in conjunction with SH, contribute to diversity in DH region cysteine distribution and disulfide bond connectivity. Furthermore, we have analyzed H chain deep sequence data to determine germline cysteine conservation and somatically generated cysteine frequency. Although several germline cysteines are typically conserved on each Ab, new cysteines arise through SH and form unique disulfide bond connectivity patterns.
Materials and Methods
The variable regions encoding seven ultralong CDR H3 Abs that have published crystal structures were obtained from the Protein Data Bank (PDB): A01 (PDB: 5ILT), B11 (PDB: 5IHU), BLV1H12 (PDB: 4K3D), BLV5B8 (PDB: 4K3E), E03 (PDB: 5IJV), Bov6 (PDB: 6E9Q), and NC-Cow1 (PDB: 6OO0). The germline gene segments IGHV1-7, IGHD8-2*01 or IGHD8-2*02, and IGHJ2-4 were from IMGT (https://www.imgt.org/; accession no. KT723008).
Alignments and DH analysis
The CLUSTALW tool (https://www.genome.jp/tools-bin/clustalw) was used to align an in silico rearranged germline VDJ sequence pairwise with the DNA sequences. For the in silico rearranged sequences, coding and in-frame nucleotide sequences of IGHV1-7, IGHD8-2*01 or IGHD8-2*02, and IGHJ2-4 were assembled to create two electronic VDJ recombined germline sequences, one for each DH allele. Because V-D and D-J junctional regions do not exist in the germline, we added the VH-DH junctional nucleotides of each mature Ab gene to produce an identical VH-DH junction for the in silico rearranged germline sequence for each pairwise alignment. The slow/accurate alignment parameter and CLUSTAL output format were applied for all alignments. The weight matrix used was IUB for nucleotide alignments and BLOSUM for amino acid alignments.
The nucleotide and amino acid sequences of the two DH alleles, IGHD8-2*01 and IGHD8-2*02, were aligned to identify differences that may be reflected in the sequences of Abs that use each allele. This analysis included mapping RGYW and WRCY SH hotspots recognized by AID and nucleotides that can form a cysteine codon with a single mutation. The most probable germline DH allele used by each Ab was assigned on the basis of comparison of the alignment score and number of gaps in pairwise nucleotide alignments with each electronic germline sequence at each set of gap penalty values tested. Gap open penalty values ranged from 15 to 50, and gap extension penalty values ranged from 1 to 30 (Supplemental Table I). The electronic germline DH allele with a consistently higher alignment score and lower number of gaps, at each set of gap penalty values, was designated as the most probable DH allele used by the aligned Ab gene. The most probable location of DH region deletions on each Ab, relative to the germline DH of each, was assessed. We assumed that a single deletion event was more biologically probable than multiple events, so the deletion location was determined on the basis of alignments with gap penalty values that resulted in a single gap in the alignment.
Cysteine positions and disulfide bonding patterns within the DH region of each Ab, with respect to its corresponding germline DH allele, were analyzed at the sequence and structural levels. First, Abs with DH regions equivalent in length to each germline DH allele were grouped by allele use, and the two groups were aligned by amino acid sequence to each electronically rearranged germline sequence separately. In another analysis, amino acid sequences of Abs with DH region deletions were separately aligned with the corresponding germline DH sequence and with the DH sequence of another Ab that served as a germline surrogate. The germline surrogate served to compare changes in cysteine positions and disulfide bonding patterns on the structural level because structural data are not available for any germline ultralong CDR H3 Abs. Abs with available crystal structure data that are most homologous in sequence to each DH allele were selected as structural surrogates for each germline DH allele. The germline structural surrogates selected were BLV1H12 for IGHD8-2*01 and B11 for IGHD8-2*02. For each structural analysis, cysteines in the mature Ab that align with germline cysteines were indicated on the amino acid alignment, primary and secondary structure topology diagram, and crystal structure. Primary and secondary structure topology diagrams illustrate the β-strands and connecting loops [as defined by Stanfield et al. (1)]. DH region deletion locations, where applicable, are indicated on each germline surrogate and defined as deletion locations relative to the germline DH at the corresponding sequence positions on the surrogate. Variables evaluated in the structural analyses include cysteine position and distribution on primary and secondary structures, disulfide bonding pattern, and spatial distribution of cysteines on crystal structures.
Next-generation sequencing (NGS) analysis and allele determination
NGS analysis of 204 cow VH regions was performed according to Safonova et al. (34). Only ultralong CDR H3s were used for the NGS analysis. Ultralong CDR H3s were defined as CDR H3s exceeding 150 nt and derived from VDJ sequences with IGHV1-7 as the best V gene match. The percentage of ultralong CDR H3s varies from 0.3% to 5.4% across 204 samples, with the average percentage being 1.9%. We define the score of an alignment between two sequences as the number of differences (excluding starting and ending gaps) normalized by the alignment length. The average score of alignments between an IGHD8-2 allele and an ultralong CDR H3 varies from 0.38 to 0.45 across all subjects, thus indicating an extremely high mutation rate. To minimize the impact of poorly aligned ultralong CDR H3s, we apply two filters. First, we say that an ultralong CDR H3 is assigned to sequence s1 rather than sequence s2 if score(s2, CDR H3) – score(s1, CDR H3) > diffmin, where diffmin is the score(s1, s2). If we replace s1 with IGHD8-2*01 and s2 with IGHD8-2*02, then |score(s1, CDR H3) – score(s2, CDR H3)| is expected to be close to score(IGHD8-2*01, IGHD8-2*02) = 0.046. However, as Supplemental Fig. 1A shows, only ∼3.69% of ultralong CDR H3s (eight CDR H3s in absolute numbers) per sample have diff above 0.04. Similarly, only ∼5.85% of ultralong CDRH3s (39 CDR H3s in absolute numbers) have diff above 0.03. To recruit more CDR H3s and avoid possible impacts of clonal expansion, we decreased the value of diffmin to 0.02. The average percentage of used ultralong CDR H3s per sample is 14.83% (Supplemental Fig. 1A). This filter allows us to discard ambiguous ultralong CDR H3s that are aligned to IGHD8-2*01 and IGHD8-2*02 with similar scores. Because a high mutation rate makes alignments less accurate, we additionally discard ultralong CDR H3s with alignment scores above 0.3 because they correspond to alignments with less than 70% identity. Supplemental Fig. 1B shows a subject that is likely homozygous for allele IGHD8-2*01. Although all ultralong CDR H3s with low mutation rates (scores below 0.3) have lower alignment scores for allele IGHD8-2*01 than for IGHD8-2*02 (and thus are assigned to IGHD8-2*01), higher mutation rates create confusion, making many ultralong CDR H3s with scores above 0.3 equally distant from both alleles. Supplemental Fig. 1C shows that the latter observation holds true for a subject that is likely homozygous by IGHD8-2*02. To classify the state of IGHD8-2 for a single subject, we compute the fraction of nondiscarded ultralong CDRH3s assigned to IGHD8-2*01. If the fraction is above 0.9, we classify the subject as homozygous for IGHD8-2*01. If the fraction is below 0.1, we classify the subject as homozygous for IGHD8-2*02. Otherwise, we classify the subject as heterozygous. Supplemental Fig. 1E shows the distribution of the fractions of ultralong CDR H3s assigned to IGHD8-2*01 across 204 subjects.
Germline DH allele length and cysteine position diversity
The two known DH alleles used to encode Abs with ultralong CDR H3 regions, IGHD8-2*01 and IGHD8-2*02, were aligned at the nucleotide and amino acid sequence levels (Fig. 1). The DH alleles are very similar with 95% identity at the nucleotide level, with several repeating units that encode G-Y-G. Both have a high density of RGYW SH hotspots (Fig. 1A). Notably, the sequence length of IGHD8-2*02 is 6 nt (2 aa) longer than IGHD8-2*01. The DH alleles are identical except at three positions: one position has a difference in nucleotide and amino acid identity (Thr in *01 and Ser in *02), and the other two positions consist of gaps in the IGHD8-2*01 sequence where an additional amino acid is inserted in the IGHD8-2*02 sequence. The sequence location of the IGHD8-2*02 serine insertion is between cysteines 2 and 3, resulting in a single amino acid position shift of cysteines 3 and 4 relative to the locations of cysteines 1 and 2 in IGHD8-2*01 (Fig. 1B). Cysteine 1 at position D2 and cysteine 2 at position D12 are conserved between both germline genes. The conserved sequence in both alleles encodes knob structural features; the CPDG motif at the N-terminal end forms turn 1 of the knob with the cysteine participating in a conserved disulfide bond, and the alternating aromatic residues at the C-terminal end forms the descending strand of the stalk. Thus, the two DH alleles are identical in encoding key structural determinants (turn, disulfide bond, and descending stalk) but are divergent in their length and in two cysteine positions.
Structural conservation of germline cysteines
To determine the use and structural positions of germline-encoded cysteines in affinity-matured Abs, a DH allele was assigned to each of seven Abs with known structures based on nucleotide alignment scores (Supplemental Table I). The alignment scores of BLV1H12 (PDB: 4K3D) and NC-Cow1 (PDB: 6OO0) were higher with IGHD8-2*01. Alignment scores of A01 (PDB: 5ILT), B11 (PDB: 5IHU), BLV5B8 (PDB: 4K3E), E03 (PDB: 5IJV), and Bov6 (PDB: 6E9Q) were higher with IGHD8-2*02. Four Abs were encoded by full-length DH regions in the absence of deletions; BLV1H12 and NC-Cow1 with IGHD8-2*01 and B11 and A01 with IGHD8-2*02 (Fig. 2A and 2B, left). Although IGHD8-2*02 is longer than IGHD8-2*01, all three Abs with deletions (BLV5B8, E03, and Bov6) consistently had higher alignment scores with IGHD8-2*02, and each had fewer DH region nucleotide mismatches with IGHD8-2*02 than IGHD8-2*01. Of the seven Abs, two use IGHD8-2*01 (BLV1H12 and NC-Cow1), and neither contains SH-generated deletions, and five use IGHD8-2*02, with two being full length (B11 and A01) and three containing deletions (BLV5B8, E03, and Bov6).
Conserved disulfide bonds using germline cysteines could have integral roles in forming the knob structural scaffold, whereas new disulfide bonds and cysteine positions can arise through SH to generate diversity in knob surface shape. Therefore, we determined (1) the conservation of germline cysteines and (2) the conservation of germline-encoded disulfide bonds (Fig. 2). In order to assess disulfide bond and deletion structural positions, we used comparator Ab structures that encoded the full-length region of either IGHD8-2*01 (BLV1H12) or IGHD8-01*02 (B11). B11 was chosen as a comparator over A01 because it maintains the highly conserved first cysteine position. At least three of four germline DH cysteine positions, including germline cysteine position 4, are conserved on all four of the Abs encoded by full-length DH regions (e.g., without deletions). The fourth germline-encoded cysteine, located on strand 2 of the β-sheet core, or its adjacent loops, forms a disulfide bond with the first DH cysteine. Germline cysteine position 1 is conserved on BLV1H12, NC-Cow1, and B11. A01 is a rare exception in bovine H chains because this cysteine is highly conserved in ultralong CDR H3 sequences. A01 has a glycine at this location; DH cysteine 1 in A01 is instead located on strand 1 and forms a disulfide bond with the fourth DH cysteine (located 5 aa downstream from strand 2). The other conserved disulfide bond is between germline cysteine 2 and an SH-generated cysteine on or within 2 aa of the third loop turn. Germline cysteine position 2 is located on loop 2 of both IGHD8-2*01 Abs (Fig. 2A) and on strand 1 of A01, the IGHD8-2*02 Ab with germline cysteine 2 conserved (Fig. 2B). Thus, the second germline cysteine is somewhat structurally conserved between IGHD8-2*01 and IGHD8-2*02 Abs but can be located on either a loop (IGHD8-2*01) or a strand (IGHD8-2*02). The relative sequence and knob structural position of the cysteines participating in these bonds are conserved on Abs using each germline DH. Other DH cysteine positions and disulfide bonds are diverse within the conserved scaffold of three antiparallel β-strands connected by loops of varying length. For example, a disulfide bond between germline cysteine 3 (on loop 2) and SH cysteine 5 (on strand 2) is distinct to BLV1H12 (Fig. 2A), and a disulfide bond between SH cysteine 5 (on strand 2) and SH cysteine 6 (on loop 3) is unique to B11 (Fig. 2B). These examples illustrate how changes in additional cysteine positions and disulfide bond patterns through SH serve as mechanisms for generating diversity in knob domain shape.
Structural locations of deletions
To determine the structural location of DH region deletions, we evaluated deletion positions relative to the structure of an Ab with a full-length DH (Fig. 3). All Abs in this analysis use germline IGHD8-2*02. In BLV5B8, three of four germline cysteines are conserved, and a deletion occurs C-terminal to the last germline cysteine (Fig. 3A). Two SH-generated cysteines arose downstream of the last germline cysteine, and these two new cysteines form a disulfide bond with each other. E03 only has two cysteines, which form a single disulfide bond, and three germline DH cysteines are deleted (Fig. 3B). For Bov6, cysteine 2 forms a disulfide bond with cysteine 3, located on loop 2. Two germline DH cysteines are deleted on Bov6; however, SH-generated cysteines have also formed outside of the deletion locations (Fig. 3C). The disulfide bond between the first cysteine and a cysteine located on strand 2 is conserved on all the Abs. Thus, because of the deletion, the conserved disulfide between germline cysteines 1 and 4 was disrupted where a new cysteine replaces germline 4 in strand 2. Germline cysteines 2, 3, and 4 were deleted on E03 (Fig. 3B), and germline cysteines 3 and 4 were deleted on Bov6 (Fig. 3C); however, an SH-generated cysteine arose on strand 2 that forms a disulfide bond to cysteine 1 on both Abs. To summarize, deletion events both delete and alter the positions of germline-encoded cysteines; however, the conserved disulfide bond between germline cysteine 1 and an SH cysteine on strand 2 is maintained.
DH allele use and germline cysteine conservation
To analyze patterns of germline cysteine conservation, we processed repertoire-sequencing data from IgG Ab repertoires of 204 cows (34). For each repertoire-sequencing dataset, we extracted ultralong CDR H3 regions and aligned them to both alleles of IGHD8-2. The average percentage identities of alignments to the closest allele of IGHD8-2 vary from 54.55% to 62.09% across 204 subjects, thus indicating that ultralong CDR H3s have extremely high mutation rates. Because a high mutation rate can make allele assignment ambiguous, we used ultralong CDR H3s with percentage identities of alignments greater than 70% only. To assign alleles of IGHD8-2 within a single subject, we computed the fraction of ultralong CDR H3s that have higher percentage identities of alignments to IGHD8-2*01 than to IGHD8-2*02. If the fraction was close to 1, we classified the subject as homozygous for IGHD8-2*01. If the fraction was close to 0, we classified the subject as homozygous for IGHD8-2*02. Otherwise, we classified the subject as heterozygous for alleles IGHD8-2*01 and IGHD8-2*02. Details of the allele assignment procedure are described in the Materials and Methods and Supplemental Fig. 1. As a result, 35 (17%), 84 (41%), and 85 (42%) subjects were classified as homozygous for allele IGHD8-2*01, homozygous for allele IGHD8-2*02, and heterozygous for both alleles, respectively. Thus, IGHD8-2*02 was used at greater than twice the frequency of IGHD8-2*01.
To determine whether germline DH alleles impact the cysteine content of the repertoire, we analyzed the cysteine number and germline conservation in homozygotes. (Fig. 4A shows that subjects homozygous for IGHD8-2*01 have slightly higher fractions of ultralong CDR H3s with six cysteines (the average fraction is 0.52) than subjects homozygous for IGHD8-2*02 (the average fraction is 0.43) (p = 1.41 × 10−11). Here and elsewhere, we used the Kruskal-Wallis test (35). A germline cysteine is conserved in an ultralong CDR H3 if positions corresponding to it in the alignment represent the germline codon TGT or the mutated codon TGC. (Fig. 4B illustrates that germline cysteines are more conserved in ultralong CDR H3s derived from IGHD8-2*01, and p values for germline cysteines ordered according to their appearance in IGHD8-2 are 6.04 × 10−3, 1.31 × 10−3, 5.29 × 10−12, and 1.34 × 10−5. Of the germline cysteines, cysteine 1 is the most conserved at greater than 90%, followed by cysteine 4. Notably these two cysteines often form a disulfide bond with one another as described above. IGHD8-2*01 is also characterized by a higher fraction of ultralong CDR H3s that have all four conserved germline cysteines compared with allele IGHD8-2*02 (p = 1.77 × 10−5) (Fig. 4C).
To determine whether heterozygotes have greater cysteine positional diversity, we analyzed the proportion of cysteine at each position (Fig. 4D). Although heterozygous alleles of IGHD8-2 contain 8 aa positions where at least 20% of CDR H3s have cysteines, homozygous alleles IGHD8-2*01 and IGHD8-2*02 contain seven and six such positions, respectively. Thus, subjects with two different ultralong CDR H3 germline DH alleles that differ in cysteine location have a repertoire that contains cysteine at more locations than homozygotes.
The Ab repertoire of cows is unusual in having few VH, DH, and JH regions to contribute to the combinatorial diversity process, but it appears to broadly use cysteine content and disulfide bond diversity created through germline cysteines and SH to create novel Ag binding structures (15, 36). As an extreme example of this, ultralong CDR H3 regions of cows appear to use the knob minidomain to bind Ag (20, 32). Immunological evolution in the knob region of cow CDR H3s appears to function as a disulfide-mediated minifold structure generator, because several crystal structures show remarkably diverse sequence content and disulfide patterns in knob regions. The knob region is exclusively encoded by a single DH in homozygous animals or either IGHD8-2*01 or IGHD8-2*02 in heterozygotes. Because ultralong CDR H3 Abs appear to use single VH, DH, and JH regions and a limited number of VL regions, there is essentially very little combinatorial diversity that can occur within this subclass of Abs. Despite this limitation, massive SH can diversify cysteine content and position, providing new disulfide loops as structural scaffolds to bind Ag (5, 15). Other mechanisms, such as gene replacement or gene conversion, may also play a role in diversity generation in cattle, but these have not been investigated in depth. Here we identify polymorphisms in the DH region as an additional mechanism to enhance the diversity of the repertoire in terms of its cysteine content and position.
In characterizing available crystal structures of ultralong CDR H3 Abs, we found representatives that use each germline DH allele. Detailed analysis allowed us to map germline cysteines onto the crystal structures. Notably, germline cysteine 1 forms a conserved disulfide bond, often with germline cysteine 4. Consistent with the structural analysis, deep sequencing showed that germline cysteine 1 is nearly universally conserved in ultralong CDR H3 regions (e.g., >90%) (Fig. 4), whereas the second most conserved germline cysteine is cysteine 4, but at a much lower frequency. Taken together, these results suggest that the naive precursor (germline) CDR H3 region may often contain a 1-4 disulfide bond. Because there are only two remaining cysteines, 3 and 4, they may also pair with one another. Alternatively, “free” cysteines may exist in the germline structure, awaiting SH to position a new cysteine nearby to form a new disulfide bond. Because these analyses used a limited number of crystal structures (two for IGHD8-2*01 and five for IGHD8-2*02), further expansion of the structural dataset will determine whether these conclusions are generalizable for ultralong CDR H3 Abs. The cow ultralong CDR H3 repertoire is unique in that a single precursor VDJ rearrangement (albeit with junctional diversity outside the knob region) appears to account for generation of the entire repertoire. No structural studies have yet been described on this unique germline CDR H3, and the indirect evidence presented here on the possible disulfide pattern is an attempt to understand its structural dynamics and early evolution. Furthermore, as more structures are solved, it may be possible to use predictive algorithms to classify sequences into different structural templates based on cysteine content and position.
The two DH alleles were present in significant frequencies in mature Ab sequences from a population of 204 cows. In this cow population, 17% of cows were IGHD8-2*01 homozygotes, and 41% were IGHD8-2*02 homozygotes. Thus, these alleles appear to be used in natural immune responses. Of the available crystal structures, we could identify examples of Abs encoded by each of the two IGHD8-2 variants. The two DH alleles are nearly identical except for their cysteine positions (Fig. 1). Heterozygous cows therefore have two germline starting points, with different loop lengths, from which novel disulfide-based loop structures may evolve. Furthermore, deletion events can further alter the cysteine positions encoded by either DH variant. Whether such heterozygous cows have advantages in immunity as a result of this added potential structural diversity has yet to be determined. Polymorphisms in V, D, and J regions are common in all vertebrate species. When amino acid changes occur in polymorphic alleles, they are often single residue changes. The IGHD8-2 region, however, is unusual in having nearly identical variants that only differ in length and, as a consequence, cysteine position. It is unknown whether the germline VDJ recombined ultralong CDR H3 encodes a knob domain that adopts a structure with a single disulfide pattern or whether structural plasticity occurs that allows different disulfide patterns to form. In the latter scenario, a single gene sequence may produce more than one structure. In other vertebrates, alternative folding of germline CDR H3 residues can indeed produce different paratope structures as a deviation from the “one gene one protein” paradigm (37–40). Whether such a mechanism extends toward multiple cysteine patterns in ultralong CDR H3 Abs is currently unknown; however, the covalent nature of disulfide bonds could stabilize germline precursors, enabling further SH processes to “lock” certain disulfide patterns favorable to binding a given Ag in place during immunological evolution. The different alleles for IGHD8-2 may encode alternative different precursor structures, enabling alternative evolutionary paths during SH. Thus, in addition to combinatorial and junctional diversity as key mechanisms to provide for a diverse Ig repertoire, cysteine positional polymorphisms and heterozygosity can also add to immune receptor diversity.
We thank Duncan McGregor, Pavel Pevzner, Ruiqi Huang, Jeremy Haakenson, and Abigail Kelley for helpful conversations during the course of this work.
This work was supported by the National Institutes of Health Grants R01GM105826 and R01HD088400 to V.V.S.
The online version of this article contains supplemental material.
The authors have no financial conflicts of interest.