Abstract
In addition to producing conventional tetrameric IgGs, camelids have the particularity of producing a functional homodimeric IgG type lacking L (light) chains and only made up of two H (heavy) chains. This nonconventional IgG type is characterized by variable and constant regions referred to as VHH and CHH, respectively, and which differ from conventional VH and CH counterparts. Although the structural properties of homodimeric IgGs have been well investigated, the genetic bases involved in their generation are still largely unknown. In this study, we characterized the organization of genes coding for the H chains of tetrameric and homodimeric IgGs by constructing an alpaca (Lama pacos) genomic cosmid library. We showed that a single IgH locus in alpaca chromosome 4 contains all of the genetic elements required for the generation of the two types of Igs. The alpaca IgH locus is composed of a V region that contains both VHH and VH genes followed by a unique DH-JH cluster and C region genes, which include both CHH and CH genes. Although this general gene organization greatly resembles that of other typical mammalian Vn-Dn-Jn-Cn translocon IgH loci, the intermixed gene organization within the alpaca V and C regions reveals a new type of translocon IgH locus. Furthermore, analyses of cDNA coding for the membrane forms of IgG and IgM present in alpaca peripheral blood B cells are most consistent with the notion that the development of a B cell bearing homodimeric IgG passes through an IgM+ stage, similar to the case for conventional IgG.
The adaptive humoral immune system responds to a variety of Ags by producing specific Abs from B lymphocytes. In higher vertebrates, the Ab shape has long been thought to be restricted to the tetrameric structure, two identical dimers each made up of an Ig heavy chain (IgH) and an Ig light chain (IgL) (1). However, the finding of bona fide dimeric H chain Abs in the camelids (2) has challenged the existing paradigms as to the structure of the Ab, the Ab binding site, and Ab repertoire generation.
In mammals, the structural organization of the IgH locus consists of numerous variable (VH), diversity (DH), joining (JH), and constant (CH) genes. This type of organization, usually referred to as a translocon structure, requires sequential gene arrangement to produce a functional H chain (3). During B lymphocyte development, single VH, DH, and JH genes are joined together by a DNA recombination process to form a single VDJ exon that codes for a functional VH region. This rearranged VH region will initially be expressed together with the most J-proximal CH gene (Cμ), leading to μH chain synthesis of the IgM class or isotype. Upon Ag encounter, an additional DNA recombination event, termed class switch recombination, can take place in B lymphocytes, resulting in replacement of the Cμ gene by one of the other CH genes, namely Cγ, Cε, or Cα. This process leads to expression of a new H chain with different effector functions, thereby shifting the Ig molecule from the IgM isotype to the IgG, IgE, or IgA isotype (4).
Besides producing conventional tetrameric IgGs, camelids (i.e., camel, dromedary, llama, alpaca, guanaco, and vicuña) have the particularity of producing functional homodimeric IgGs lacking L chains and are therefore constituted only of two identical H chains (2, 5). In addition to their dissimilar Ig shapes, tetrameric and homodimeric IgGs display distinct H chains. Biochemical and cDNA sequence analyses have shown that the C regions present in tetrameric and homodimeric IgGs are different, referred to as CH and CHH regions, respectively. Homodimeric IgG chains lack the first constant domain, possibly due to a point mutation on the donor-splicing site present in the first C exon/intron boundary (2, 6, 7). Moreover, tetrameric and homodimeric IgGs differ in their V regions (so-called VH and VHH regions, respectively), which are encoded by a distinct set of V genes (8, 9). VH and VHH regions have the same general structure made up of four framework regions (FR)2 and three complementarity-determining regions or CDRs; however, VHH regions display, in their FR2, several hydrophilic substitutions of amino acids highly conserved across species (i.e., Val42 to Phe or Tyr, Gly49 to Glu, Leu50 to Arg or Cys, and Trp52 to Gly or Phe) (5, 10, 11). These hydrophilic amino acid substitutions in the VHH together with the absence of the first constant domain in the CHH region may explain the lack of pairing of homodimeric IgH to L chains. Consequently and in odds with tetrameric IgGs, the homodimeric IgGs present the VHH region as the unique domain of Ag binding.
A required step in the understanding of molecular mechanisms governing the formation of tetrameric and homodimeric IgGs in camelids is the characterization of the organization of the genes that encode them. In the present study, we showed that VH and VHH genes as well as CH and CHH genes are arranged in intermixed conformation in a single IgH locus. A unique D-J cluster bridges V and C clusters. Thus, VHH and CHH genes have emerged from conventional VH and CH genes without disrupting the typical Vn-Dn-Jn-Cn translocon organization of the IgH locus. Our transcript analyses of membrane Ig from peripheral lymphocytes strongly suggested that the development of a B cell bearing homodimeric IgG passes through an IgM+ stage, similar to the case for conventional IgG.
Materials and Methods
Genomic cosmid library construction, screening, sequencing, and assembly
A genomic cosmid library was constructed from a single alpaca (Lama pacos) testicular DNA using the SuperCos1 cloning vector kit (Stratagene) according to the manufacturer’s instructions. Screening of the genomic library was performed by colony hybridization using V-, J- and C-specific radiolabeled probes. Probes were generated by PCR amplification of testicular genomic DNA with specific primers (Table I) and prepared with a random priming kit (Roche Applied Science) according to the manufacturer’s instructions. Positive clones were isolated and cosmid DNAs were purified with the NucleoSpin plasmid kit (Macherey-Nagel) according to specific instructions for the purification of large DNA fragments.
Primer sequencesa
Name . | Sequence (5′–3′) . | Usage . |
---|---|---|
VHbackA6 | gATgTgCAgCTgCAggCgTCTggRggAgg | Cosmid screening |
VFR3rev | ACAgTAATACASggCCgTgTCCTCAgRTTTC | Cosmid screening |
V1subgroupSS | gTCCAgCTggTgCAgCCAggg | DNA/cDNA library |
V2subgroupSS | CAggTgCAgCTgCAggAgTCgg | DNA/cDNA library |
V3subgroupSS | CAgKTgCAgCTCgTggAgTCTgg | DNA/cDNA library |
VsubgroupsRev | ACAgTAATACACggCCgTgCCCTCAg | DNA/cDNA library |
Cμ1-for | AgCTCATCTgCCCCgACACTC | Cosmid screening |
Cμ4-rev | GgACTTgTCCACggTCCTCTC | Cosmid screening |
Cμ2-rev | CAggACACggAgATCTCCC | cDNA library |
μM1-rev | GAggACgATgAAggTggAggCC | cDNA library |
5′JH/llama | gAACCAAAgTCAgCACAACgC | Cosmid screening |
3′JH/llama | ggTgAgCgAgCTCgTgAgAgC | Cosmid screening |
Cγ1c-for | ATCggTCTATCCTCTgACTgCTAgATgC | Cosmid screening |
Cγ3c-rev | CTCgTgCATCACCACACAgg | Cosmid screening |
Cγ2c-rev | gCAggACgCTgACCACgC | cDNA library |
γM1-rev | AgATggTggTCCACAgCCC | cDNA library |
Cε2-for | CTACACCTgCCgggTCAAC | Cosmid screening |
Cε4-rev | gAATTCgCggCTgAAgACg | Cosmid screening |
Cα1-for | CCATgAgCAgCCAgCTgACCTTgC | Cosmid screening |
Cα3-rev | gCCCACCATgCAggAgAAggTgTC | Cosmid screening |
Name . | Sequence (5′–3′) . | Usage . |
---|---|---|
VHbackA6 | gATgTgCAgCTgCAggCgTCTggRggAgg | Cosmid screening |
VFR3rev | ACAgTAATACASggCCgTgTCCTCAgRTTTC | Cosmid screening |
V1subgroupSS | gTCCAgCTggTgCAgCCAggg | DNA/cDNA library |
V2subgroupSS | CAggTgCAgCTgCAggAgTCgg | DNA/cDNA library |
V3subgroupSS | CAgKTgCAgCTCgTggAgTCTgg | DNA/cDNA library |
VsubgroupsRev | ACAgTAATACACggCCgTgCCCTCAg | DNA/cDNA library |
Cμ1-for | AgCTCATCTgCCCCgACACTC | Cosmid screening |
Cμ4-rev | GgACTTgTCCACggTCCTCTC | Cosmid screening |
Cμ2-rev | CAggACACggAgATCTCCC | cDNA library |
μM1-rev | GAggACgATgAAggTggAggCC | cDNA library |
5′JH/llama | gAACCAAAgTCAgCACAACgC | Cosmid screening |
3′JH/llama | ggTgAgCgAgCTCgTgAgAgC | Cosmid screening |
Cγ1c-for | ATCggTCTATCCTCTgACTgCTAgATgC | Cosmid screening |
Cγ3c-rev | CTCgTgCATCACCACACAgg | Cosmid screening |
Cγ2c-rev | gCAggACgCTgACCACgC | cDNA library |
γM1-rev | AgATggTggTCCACAgCCC | cDNA library |
Cε2-for | CTACACCTgCCgggTCAAC | Cosmid screening |
Cε4-rev | gAATTCgCggCTgAAgACg | Cosmid screening |
Cα1-for | CCATgAgCAgCCAgCTgACCTTgC | Cosmid screening |
Cα3-rev | gCCCACCATgCAggAgAAggTgTC | Cosmid screening |
Abbreviations: SS, sense; for, forward; rev, reverse; K= G/T.
Sequencing reactions were performed using an ABI Prism BigDye terminator cycle sequencing ready reaction kit and run on a 3730 XL genetic analyzer (Applied Biosystems) at the Genomics Platform of the Institut Pasteur (Pasteur Genopole, Paris France). Sequences were assembled into contigs by Phred/Phrap (12, 13) and visualized using the CONSED software package. After sequencing and assembly, contig was verified by PCR and by comparing physical mapping predicted by sequencing with physical mapping of all genes obtained by Southern blot and performed at genomic and cosmid DNA levels. Sequence similarity searches were performed using basic local alignment search tool (BLAST) analysis against the National Center for Biotechnology Information nonredundant database. Ig gene annotations were performed according to the international ImMunoGeneTics (IMGT) information system (imgt.cines.fr). Sequences were submitted to the European Molecular Biology Laboratory (EMBL)/GenBank/DNA Data Bank of Japan (DDJB) databases.
Fluorescence in situ hybridization (FISH)
The alpaca IgH locus was detected by dual-color FISH probes and performed in metaphasic and interphasic lymphocytes prepared from alpaca fresh whole blood. Cells were cultured for 72 h in the presence of PHA-M (10 μg/ml; Roche) and arrested in metaphase by a colcemid solution (KaryoMAX colcemid solution at 10 μg/ml in PBS; Invitrogen). Cells were incubated in 0.56% KCl for 20 min at 37°C, recovered by centrifugation at 200 × g for 5 min, fixed twice in methanol/acetic acid (3:1; v/v) for 10 min at room temperature and dropped onto glass microscope slides. The slides were then air dried, washed in 2× SSC for 1 h at 37°C, and dehydrated in ethanol gradients. Slides were denatured for 4 min at 73°C and hybridization was performed at 37°C overnight. Slides were then washed first in 0.4 × SSC/0.1% Tween 20 for 2 min at 73°C and then in 0.1 × SSC/0.1% Tween 20 for 2 min at room temperature. FISH probes used were the cosmids CosV54 and CosG24 labeled with SpectrumGreen-dUTP (Vysis) and SpectrumOrange-dUTP (Vysis), respectively, and prepared for FISH assay according to nick translation kit recommendations (Vysis). After mounting the probes on slides and counterstaining in Vectashield/DAPI (4′,6′-diamidino-2-phenylindole; Vector Laboratories), images were taken using a photomicroscope (Zeiss Axiophot) equipped with epifluorescence optics and a filter set. A minimum of 25 metaphases was analyzed. According to alpaca chromosome classification (14) and by performing conventional Giemsa-stained metaphases, we identified the chromosomal location of the alpaca IgH locus by an inverted DAPI image of analyzed metaphases displaying G bands.
PBMC isolation, RNA extraction, and RT-PCR
Heparinized blood samples were obtained from a single alpaca and PBMC were isolated by Ficoll-Hypaque 1077 gradient centrifugation (Sigma- Aldrich). Total RNA from 2 × 107 PBMC was extracted by TRIzol (Invitrogen). cDNA was synthesized from 2 μg of total RNA using membrane isotype-specific primers, μM1 reverse and γM1 reverse (Table I) in a 50-μl final reaction in the presence of 30 U of avian myeloblastosis virus reverse transcriptase (Promega). Single-strand μH (IgM)- and γH (IgG)-specific products were amplified by PCR using V3 subgroup sense primer together with Cμ2 reverse or Cγ2c reverse primer, respectively (Table I).
PCR parameters
PCR amplifications were performed with 0.5 μg of alpaca genomic DNA or 10 μl of cDNA preparation in the presence of 5 U of Taq DNA polymerase in a 50-μl final reaction volume according to the manufacturer’s instructions (QBiogene). Parameters for PCR were 94°C for 5 min followed by 30 cycles of 94°C for 30 s, 55–64°C for 30 s, 72°C for 30 s to 2 min, and finally holding at 72°C for 10 min. All PCR products were cloned into a pCR2.1-TOPO vector according to the manufacturer’s instructions (TOPO TA cloning; Invitrogen) and submitted to sequencing.
Sequences and phylogenetic analysis
Multiple sequence alignments were made with ClustalW at the biology WorkBench server. Phylogenetic trees were calculated using PHYLIP programs, PRODIST, and NEIGHBOR-JOINING/Poisson correction and were calculated based on a bootstrap of 1000 separate genetic distance matrices.
Generation of an alpaca VH/VHH germline database
DNA from two alpacas were amplified with V subgroup-specific primers (Table I) and the amplification bands were cloned. Three hundred independent clones (200 IgHV3, 50 IgHV2, and 50 IgHV1) were sequenced from each individual alpaca. A putative V gene was defined when a sequence diverging by less than two nucleotides was found at least twice in one of the individuals (PCR and sequencing errors were estimated at two nucleotides per 300 bp in our study). Comparison between the putative VH/VHH repertoire in the two individuals showed an overlapping of 85%, and only overlapping genes were considered bona fide VH/VHH alpaca genes.
cDNA sequence analyses
The nature of the VH/VHH present in cDNA clones was identified by the multiple sequence alignment program using PileUp and Pretty command lines on the GCG program (Wisconsin package version 10.2, 1999; Genetics Computer Group, Madison, WI) and confirmed by a BLAST search (blastnt and blastp) against the VH/VHH germline database we had generated (see above). Relative diversity of the variable regions was evaluated by the PlotSimilarity-IDEntity command line in the same program. PlotSimilarity calculates the average similarity among all members of a group of aligned sequences at each position in the alignment, using a sliding window of 10 for comparison. IDEntity plots the level of identity between the sequences as the number of different amino acids occurring at a given position divided by the frequency of the most common amino acid at that position. VH and VHH sequences were considered to be mutated when they contained more than two nucleotide differences in their CDRs or more than four nucleotide differences in the sequenced fragment when compared with the most homologous germline gene (PCR and sequencing errors were estimated at two nucleotides per 300 bp in our study).
Accession numbers
Cosmid, V gene, and cDNA sequences were submitted to the EMBL/GenBank/DDBJ databases (www.ebi.ac.uk/embl/). Cosmid CosV19 is under accession no. AM773548. The sequence spanning 222,796 bp that resulted from the overlapping of eight cosmids (CosV54, CosV29, CosD, CosQ, CosG29, CosG24, Cos22, and CosG1) is under accession number AM773729. V gene sequences are under accession nos. mentioned in Fig. 1. cDNA sequences of TMVHH-Cμ are under accession nos. AM998810, AM998811, AM998812, AM998813, AM998814, AM998815, AM998816, AM998817, AM998818, and AM998819.
Amino acid sequences of alpaca IGHV1, IGHV2, and IGHV3 subgroup members, nucleotide/amino acid sequences of alpaca DH and JH genes, and V sequences. Numbering is according to position in the locus in 5′ to 3′ direction (V is underlined in all DH and JH genes) and according to the alignment/clustering of V genes. The sequences of V genes were analyzed by the multiple sequence alignment GCG program using the PileUp and Pretty command line. Only V gene sequences from the CDR1 to CDR2 regions are represented. Points indicate gaps introduced for maximal homology and dashes denote identical amino acids with VH/VHH consensus (IgHVcs). “Lp” represents Lama pacos. CDR1 (gray shading) and FR2 and CDR2 (gray shading) regions of V genes are determined according to IMGT nomenclature and indicated at the top. Black and white boxed amino acid sequences represent hallmarks of VHH and VH genes, respectively. One of six VH1 members, two of 11VH2 members, and six of 54 VH3 members are pseudogenes in-frame with the stop codon. Four of 54 (LpVH3-S49, -S50, -S51, and -S52) of VH3 members are pseudogenes out-of-frame and are not represented in the figure. For the four pseudogenes the accession numbers are AM939774, AM939775, AM939776, and AM939777. DH and JH nucleotide/amino sequences are represented together the RSS elements composed of nonamer and heptamer sequences.
Amino acid sequences of alpaca IGHV1, IGHV2, and IGHV3 subgroup members, nucleotide/amino acid sequences of alpaca DH and JH genes, and V sequences. Numbering is according to position in the locus in 5′ to 3′ direction (V is underlined in all DH and JH genes) and according to the alignment/clustering of V genes. The sequences of V genes were analyzed by the multiple sequence alignment GCG program using the PileUp and Pretty command line. Only V gene sequences from the CDR1 to CDR2 regions are represented. Points indicate gaps introduced for maximal homology and dashes denote identical amino acids with VH/VHH consensus (IgHVcs). “Lp” represents Lama pacos. CDR1 (gray shading) and FR2 and CDR2 (gray shading) regions of V genes are determined according to IMGT nomenclature and indicated at the top. Black and white boxed amino acid sequences represent hallmarks of VHH and VH genes, respectively. One of six VH1 members, two of 11VH2 members, and six of 54 VH3 members are pseudogenes in-frame with the stop codon. Four of 54 (LpVH3-S49, -S50, -S51, and -S52) of VH3 members are pseudogenes out-of-frame and are not represented in the figure. For the four pseudogenes the accession numbers are AM939774, AM939775, AM939776, and AM939777. DH and JH nucleotide/amino sequences are represented together the RSS elements composed of nonamer and heptamer sequences.
Results
Gene organization of the alpaca IgH locus
To identify the structural organization of alpaca genes coding for the H chains of tetrameric and homodimeric Igs (Fig. 2,A), we constructed a genomic cosmid library and screened it with V, J, and C radiolabeled probes. According to the physical mapping of positive cosmids and the type of genes they contain, eight overlapping cosmid clones covering ∼223 kb of a V-D-J-C region were selected for complete sequencing. This V-D-J-C region is organized in the following order: 5′- VHH-3VH-7DH-7JH-Cμ-Cδ-Cγ2b-Cγ1a-Cγ1b-Cγ2c-Cε-Cα- 3′ (Fig. 2, B and D). Thus, the V cluster of the alpaca IgH locus contains both VHH and VH genes followed by a unique DH-JH cluster and C region genes, including CHH genes (i.e., Cγ2b and Cγ2c) of homodimeric IgGs and CH genes (i.e., Cγ1a and Cγ1b) of tetrameric IgGs (Fig. 2, B and D). To ensure the unicity of the IgH locus in the alpaca genome, we performed a FISH assay in metaphase and interphase alpaca cells. Hybridization signals obtained by dual color FISH with CosV54 (VHH and VH) and CosG24 (CH and CHH) cosmid probes (Fig. 2,B) were colocalized on the telomeric long arm of alpaca chromosome 4 (Fig. 2 C). Altogether, these results demonstrate that VHH, VH, DH, JH, CHH, and CH genes are clustered together on a single IgH locus in the alpaca genome.
Alpaca immunoglobulins: structure and gene organization. A, Schematic representation of the general structure of conventional (tetrameric; blue) and nonconventional (homodimeric; green) IgGs in camelids, including alpaca. B, Alignment of cosmid clones containing alpaca Ig genes (to scale). Arrows and arrowheads indicate direction of transcription. C, Cytogenetic mapping of the IgH locus on the alpaca genome. Dual color FISH showing colocalization (merge) of cosmid clones CosV54 (VH/VHH; green) and CosG24 (CH/CHH; red) on metaphase and interphase cells. D, Details of alpaca IgH gene organization and domain structures of each C gene (not to scale). Red stars represent punctual mutation in the donor splicing site of the C1 exon/intron boundary of the Cγ2b and Cγ2c genes. White star represents stop codon mutation in the Cδ3.
Alpaca immunoglobulins: structure and gene organization. A, Schematic representation of the general structure of conventional (tetrameric; blue) and nonconventional (homodimeric; green) IgGs in camelids, including alpaca. B, Alignment of cosmid clones containing alpaca Ig genes (to scale). Arrows and arrowheads indicate direction of transcription. C, Cytogenetic mapping of the IgH locus on the alpaca genome. Dual color FISH showing colocalization (merge) of cosmid clones CosV54 (VH/VHH; green) and CosG24 (CH/CHH; red) on metaphase and interphase cells. D, Details of alpaca IgH gene organization and domain structures of each C gene (not to scale). Red stars represent punctual mutation in the donor splicing site of the C1 exon/intron boundary of the Cγ2b and Cγ2c genes. White star represents stop codon mutation in the Cδ3.
V region genes
The sequenced fraction of the alpaca V cluster, localized upstream from the DH-JH genes, contained one VHH and three VH genes, herein named VHH3-1, VH3-1, VH3-2, and VH1-1, respectively (Fig. 2,D and Fig. 3). Additionally, a second nonoverlapping cosmid (CosV19) containing one VHH and two VH genes (hereafter designated VHH3-S1, VH3-S1, and VH2-S1) was also sequenced (Fig. 3).
Alpaca V genes. Alignment of predicted amino acid sequences of VHH3-1, VH3-1, VH3-2, VH1-1, VHH3-S1, VH3-S1 and VH2-S1 genes. The FR1, CDR1 (gray shading), FR2, CDR2 (gray shading), and FR3 regions of V genes were determined according to IMGT nomenclature and are indicated at the top. Black and white boxed amino acids at positions 42, 49, 50, and 52 represent hallmarks of VHH and VH genes, respectively. Regulatory elements of the V gene are as follows: octamer and TATA box representing a putative promoter and leader exon parts 1 and 2 (L-PART1 and L-PART2) interrupted by an intron V sequence (IVS). Downstream from the V region, RSSs are represented by a heptamer (H) and nonamer (N) separated by a 23-bp spacer.
Alpaca V genes. Alignment of predicted amino acid sequences of VHH3-1, VH3-1, VH3-2, VH1-1, VHH3-S1, VH3-S1 and VH2-S1 genes. The FR1, CDR1 (gray shading), FR2, CDR2 (gray shading), and FR3 regions of V genes were determined according to IMGT nomenclature and are indicated at the top. Black and white boxed amino acids at positions 42, 49, 50, and 52 represent hallmarks of VHH and VH genes, respectively. Regulatory elements of the V gene are as follows: octamer and TATA box representing a putative promoter and leader exon parts 1 and 2 (L-PART1 and L-PART2) interrupted by an intron V sequence (IVS). Downstream from the V region, RSSs are represented by a heptamer (H) and nonamer (N) separated by a 23-bp spacer.
We were able to distinguish VHH from VH genes in the alpaca by their distinctive FR2 amino acid sequences (5, 10, 11). Thus, VHH3–1 and VHH3-S1 bear amino acids F42/Y42, E49/Q49, R50, and F52/L52, whereas VH3-1, VH3-2, VH3-S1, VH1-1, and VH2-S1 bear the typically conserved amino acids V42/I42, G49, L50, and W52/S52 (Fig. 3). The fact that both VHH and VH genes were present in the two sequenced cosmids strongly suggests that VHH and VH genes are scattered along the V cluster of the alpaca IgH locus.
All V genes reported here are potentially functional, as suggested by the presence of the following: 1) upstream regulatory elements (i.e., octamer and TATA box); 2) leader exons (i.e., leader part 1 and leader part 2); 3) an uninterrupted open reading frame (i.e., V-region); and 4) a downstream recombination signal sequence (RSS) composed of heptamer and nonamer sequences separated by a 23-bp spacer (Fig. 3).
Phylogenetic analyses of the seven V genes revealed the existence of at least three V subgroups in the alpaca IgH locus. Based on their degree of homology with human IgHV clans I, II, and III, we designated them the IgHV1, IgHV2, and IgHV3 subgroups (Fig. 4). The identification of the IgHV1 and IgHV2 subgroups in this study contrasts with previous reports in which only members of the IgHV3 subgroup were found in a PCR-amplified VH/VHH genomic/cDNA database of a dromedary and a llama (11, 15) (Fig. 4).
Phylogenetic tree of alpaca V gene subgroups. Comparison of amino acid consensus-related human IgHV I, II, and III clans and dromedary and llama VH3 and VHH3 sets. Each corresponding consensus was obtained from IMGT sequence resources. For alpaca V genes, the number before hyphen indicates a subgroup and the digit after the hyphen indicates the gene or subset name within the subgroup/set (Fig. 1). The numbers 62, 95, and 100 represent bootstrap values.
Phylogenetic tree of alpaca V gene subgroups. Comparison of amino acid consensus-related human IgHV I, II, and III clans and dromedary and llama VH3 and VHH3 sets. Each corresponding consensus was obtained from IMGT sequence resources. For alpaca V genes, the number before hyphen indicates a subgroup and the digit after the hyphen indicates the gene or subset name within the subgroup/set (Fig. 1). The numbers 62, 95, and 100 represent bootstrap values.
To determine the VH/VHH germline repertoire, we used V subgroup-specific primers to amplify alpaca DNA. From 600 sequences analyzed, we found that the IgHV3, IgHV2, and IgHV1 subgroups contains 71, 11, and 6 V gene members, respectively (Fig. 1). Interestingly, only the IgHV3 subgroup contains both VHH and VH genes (17 VHH members divided into six subsets and 54 VH members divided into 12 subsets; Fig. 1), whereas the IgHV1 and IgHV2 subgroups contain exclusively VH members (Figs. 1 and 4). Members of V subgroups were found to be expressed in our alpaca cDNA databases. Our results showed that the germline and expressed repertoires of the V gene in camelids are larger than previously defined and suggested that VHH genes emerged from preexisting VH members of the IgHV3 subgroup.
DH-JH cluster
Four kilobases downstream from the last V gene we identified seven different DH genes (designated DH1 to DH7) spanning a 38.5-kb stretch of DNA (Fig. 2,D). The identification of RSS elements composed of nonamer and heptamer elements separated by 12 bp on both sides of each DH gene, together with the existence of at least one open reading frame suggest that they are potentially functional (Fig. 1). Reminiscent of other mammalian IgH loci (i.e., human and mouse) (16, 17), the DH7 gene is clustered together with the JH gene.
The JH cluster contains seven genes tightly packed together (Fig. 2,D). All JH genes (i.e., JH1–JH7) are potentially functional as suggested by the presence of an upstream RSS element with a 22- to 23-bp spacer, one open reading frame, and a downstream RNA donor splicing site (Fig. 1).
The presence of a unique DH-JH cluster in the alpaca IgH locus forces the VHH as well as the VH genes to recombine with the same DH and JH genes.
C region genes
The alpaca IgH locus contains eight C genes located downstream from the DH-JH cluster. Based on their relative positions inside the locus (Fig. 2,B), their domain structure (Fig. 2,D), and their degree of similarity to other Ig mammalian C genes (data not shown), we identified one Cμ, one Cδ, four Cγ, one Cε, and one Cα gene. Characterization of C genes of the alpaca IgH locus revealed that Cγ2b and Cγ2c genes encoding the CHH region of homodimeric IgG2b and IgG2c, respectively, flank Cγ1a and Cγ1b genes encoding the CH region of conventional IgG1a and IgG1b, respectively (Fig. 2, B and D). We were able to distinguish Cγ2 genes from Cγ1 genes on the basis of two features previously described in cloned Cγ genes and/or cDNA in camelid members: 1) the presence of a point mutation (G to A) in the putative donor splicing site flanking the first C exon; and 2) their specific hinge domain (data not shown) (5, 6, 7, 11, 18).
In addition, the eight alpaca C genes showed typical features in their intron-exon organization. Except for the Cδ gene, which is characterized by several stop codons inside the third C exon and by the apparent absence of exons coding hinge, secretory, and membrane domains, all other alpaca C genes produce a complete open reading frame (Fig. 2 D). Thus, considering the domain structure of the eight C genes, typical IgM, IgG1, IgG2, IgE, and IgA can potentially be expressed by the B lymphocyte in either secreted forms or as membrane-bound receptors for Ag.
Switch (S) and enhancer regions
Upstream of each C region gene we identified tandem repeat sequences, termed switch or S regions. These S regions are known to serve as targets for class switch recombination events in all described mammalian IgH loci, replacing the Cμ region gene by one of the downstream C region genes (19, 20, 21). In alpaca, the putative Sμ region spans >3.5 kb of DNA and predominantly contains AGCT or GAGCT and GGGCT pentamer sequences as tandem repeats. The same sequence repeats are also found within Sδ, Sγ, Sε, and Sα. Sγ regions are also composed of longer tandem repeats. These data suggest that switch recombination events in the alpaca IgH locus probably take place between the donor Sμ region and one of the downstream acceptor Sγ2, Sγ1, Sε, or Sα regions by the deletion of intervening sequences, as described in other species (21, 22).
In addition to the S regions, we identified a putative enhancer (Eμ enhancer) in the JH-Cμ intron (Fig. 2 D) known to drive IgH transcription and to be involved in the regulation of IgH gene assembly (23, 24). This Eμ enhancer displays typical nuclear binding motifs μE1-μE5-μE2-μA-μB-μE4-O flanked by AT-rich nuclear matrix attachment region sequences.
Features in common with other Artiodactyla species
As previously described in sheep, bovine, and pig genomes (25, 26), our analyses also revealed that alpaca Cμ1 and Cδ1 exons display >90% amino acid identity. This close similarity is specific to species belonging to the Artiodactyla order as attested to by the lower rate of similarity observed, for example, in humans and mice (32.1 and 22.1%, respectively). This striking similarity is not restricted to Cμ1 and Cδ1 exons but also covers the Sμ and Sδ regions (Sμ-Cμ1 and the Sδ-Cδ1 regions also share 79.7% sequence identity. This strongly suggests that an unequal crossing over or a gene conversion event between Cμ and Cδ region genes replaced the preexisting Cδ1 exon and created an Sδ region before speciation of the Artiodactyla members 40–55 million years ago. Of note, the presence of an Sδ region upstream of the Cδ gene is also unique to Artiodactyla members (25).
Membrane IgG2, IgG1, and IgM expressed by peripheral blood B cells
The organization of the alpaca IgH locus raises the question of whether B cells expressing homodimeric IgGs have gone through an IgM+ stage during their development as has been shown for those expressing tetrameric IgGs in other mammalian species (4), and it is also likely to be the case for camelids. Peripheral blood B cells are a mixture of newly formed, naive surface (s)IgM+ cells and Ag-experienced, activated/memory sIgM+, sIgG+, sIgA+, or sIgE+ B cells. Naive and memory B cells can be distinguished by their V gene sequences. Thus, naive B cells bear germline encoded V region genes whereas memory B cells accumulate somatic mutations, particularly in the CDRs (27, 28, 29, 30, 31, 32).
Previous studies using mice transgenic for chimeric dromedary or llama/human IgH loci reported that homodimeric IgG2 can be expressed at the cell surface of developing and mature B cells and can replace IgM in its function during B cell development and as an Ag sensor (33, 34). These data could be interpreted as evidence that camelid B cells expressing homodimeric IgGs develop without passing through an IgM+ stage. If this were the case, one would expect to find a substantial fraction (higher than that of IgG1 genes and possibly similar to that of IgM genes) of unmutated IgG2 genes in peripheral blood B cells from alpaca. In contrast, if this were not the case one would expect to find VHH-IgM transcripts and a small fraction of unmutated IgG2 not very different from that of IgG1. To evaluate these possibilities, we analyzed transcripts coding for the membrane forms of IgG and IgM present in alpaca peripheral blood B cells. Because only the IgHV3 subgroup contains both VH and VHH genes, we restricted the analyses to this subgroup. As expected, amplification with IgHV3 and pan-Cγ primers resulted in two bands of sizes compatible with C1-containing IgG1 and C1-lacking IgG2 transcripts, whereas amplification with the same IgHV3 primer and a Cμ-specific primer resulted in a single band consistent with a C1-containing IgM (Fig. 5).
H chain amplification of IgM (μH), IgG1 (γ1), and IgG2 (γ2). Transcripts encoding the membrane from the μH and γH chains were amplified by PCR with primers specific to the IgHV3 subgroup and Cμ2 or Cγ2 exons.
H chain amplification of IgM (μH), IgG1 (γ1), and IgG2 (γ2). Transcripts encoding the membrane from the μH and γH chains were amplified by PCR with primers specific to the IgHV3 subgroup and Cμ2 or Cγ2 exons.
The amplifications were purified and cloned and a total of 576 randomly picked clones (192 per amplification band) were sequenced and compared with the alpaca germline VH and VHH databases we had previously generated (Fig. 1). The V gene used was unambiguously identified in ∼80% of the clones (Figs. 1 and 6,A). Consistent with data obtained in mice and humans (27, 28, 29, 30, 31, 32), 67 and 95% of VH genes found together with IgM and IgG1, respectively, were mutated and most of the mutations were clustered in hypervariable regions (Fig. 6,B). Interestingly, 96% of VHH genes found together with IgG2 were also mutated, mostly in their CDRs, indicating that peripheral blood B cells bearing IgG2, similar to those bearing IgG1 and about half of those bearing IgM, are memory B cells (Fig. 6,C). Furthermore, 5.2% of IgM transcripts were found to bear VHH genes (Fig. 6,A) and all of them were unmutated (data not shown), thus reflecting the naive status of the cells carrying them (Fig. 6 C). Altogether, these data are most consistent with the notion that the development of a B cell bearing IgG2 passes through an IgM+ stage and that switching to IgG2, similar to the case for IgG1, requires prior Ag encounter.
VH/VHH usage and relative variability in sIgM, sIgG1, and sIgG2 peripheral blood B cells A, IgHV3 subgroup gene usage (VHH3 subset genes B, A, D, and C (shades of green); VH3 subset genes A, F, J, K, and H (shades of blue)) in IgM, IgG1, and IgG2 repertoires. NI denotes V regions that could not be unambiguously assigned to any known germline V genes. cDNA in which the V and C regions were crossed over (i.e., VHCγ2, 7%; VHHCγ1, 3%) likely reflect PCR artifacts and were excluded from analyses. B and C, Variability analyses of expressed V gene (VH3 subset genes A, F, J, K and H; VHH3 subset genes A, B, C, and D) used for IgM, IgG1, and IgG2 repertoires compared with germline V gene counterparts (VH3g in blue and VHH3g in green) Variability scores were performed by the GCG plot similarity-IDEntity program. CDRs were defined based on IMGT nomenclature.
VH/VHH usage and relative variability in sIgM, sIgG1, and sIgG2 peripheral blood B cells A, IgHV3 subgroup gene usage (VHH3 subset genes B, A, D, and C (shades of green); VH3 subset genes A, F, J, K, and H (shades of blue)) in IgM, IgG1, and IgG2 repertoires. NI denotes V regions that could not be unambiguously assigned to any known germline V genes. cDNA in which the V and C regions were crossed over (i.e., VHCγ2, 7%; VHHCγ1, 3%) likely reflect PCR artifacts and were excluded from analyses. B and C, Variability analyses of expressed V gene (VH3 subset genes A, F, J, K and H; VHH3 subset genes A, B, C, and D) used for IgM, IgG1, and IgG2 repertoires compared with germline V gene counterparts (VH3g in blue and VHH3g in green) Variability scores were performed by the GCG plot similarity-IDEntity program. CDRs were defined based on IMGT nomenclature.
D and J usage in expressed VH and VHH repertoires
From the sequences described above, we were able to determine and compare the frequencies at which different JH genes were found in expressed VH and VHH regions. Only five of seven alpaca JH genes were found in the expressed Ig repertoire, suggesting that JH1 and JH5 may be pseudogenes (Fig. 7,A). Along this same line, it is interesting to note that JH1 lacks the WGXG motif found in most, if not all, functional JH genes in mammals (IMGT database; Ref. 35) and that the putative RSSs of JH1 and JH5 are mostly different from the consensus RSS, particularly in their nonamers (Fig. 1). The other five JH genes were found at comparable frequencies in VH and VHH regions with clear overrepresentation of the JH4 gene, which was found in about half of the sequences (Fig. 7,A). The extensive mutations present in the sequenced CDR3 regions precluded similar analysis for utilization of the different DH genes. However, the seven DH genes could be unambiguously identified in a set of sequences, indicating that they are all functional and substantially contribute to the functional diversity of IgH chains in alpaca (Fig. 7 B).
Usage of DH and JH. A, Distribution of JH gene usage among VH and VHH expressed as part of IgM, IgG1, and IgG2 H chains. B, All alpaca DH genes are used. Examples of junctional sequences found in IgM, IgG1, or IgG2 chains in which the DH genes used could be unambiguously determined.
Usage of DH and JH. A, Distribution of JH gene usage among VH and VHH expressed as part of IgM, IgG1, and IgG2 H chains. B, All alpaca DH genes are used. Examples of junctional sequences found in IgM, IgG1, or IgG2 chains in which the DH genes used could be unambiguously determined.
Discussion
One IgH locus in the alpaca genome
In this study, we showed that a single IgH locus in the alpaca genome contains all of the genetic elements required for generation of the two types of Igs that characterize the specific immune system of the camelids: tetrameric and homodimeric IgGs. The alpaca IgH locus has maintained the common general Vn-Dn-Jn-Cn translocon structure because: 1) VH and VHH genes localize on the same part of the V region; 2) CH and CHH genes also localize on the same part of the C region; and 3) there is only one DH-JH cluster that links the V region to the C region. Although this general structure greatly resembles that of other mammalian IgH loci (21), the intermixed organization of VH and VHH genes within the V region and that of CH and CHH genes within the C region reveal a new type of translocon IgH locus. Intermixed organization of V genes that can be expressed as parts of two different proteins is a common feature of the TCRα/δ locus present in every species analyzed to date (36). Vα and Vδ genes, however, rearrange to different (D) and J genes. Thus far, camelids are unique among tetrapods in that two different sets of V genes located in the same translocon IgH locus rearrange to the same D-J cluster and then to different sets of C genes to generate two different proteins: the H chains of tetrameric and homodimeric IgGs.
IgM status in camelids
B cell development culminates with the production of naive resting IgM+ B cells that, upon Ag encounter, differentiate into IgM+ or IgG+ memory B cells and into Ab-secreting (IgM+ or IgG+) effector cells. There is no reason to believe that this scenario is substantially different with respect to B cells expressing tetrameric Igs in camelids. The finding of homodimeric IgGs in the absence of detectable levels of homodimeric IgMs in these species (37, 38) raises important questions concerning the development and differentiation of B cells that secrete homodimeric Abs in particular and whether homodimeric IgG-expressing cells go through an IgM+ stage during their development. Some of the results presented here pertain to this question.
We have shown that virtually all of the membrane IgG2 sequences obtained from peripheral blood B cells are mutated, indicating the absence of naive cells in the sIgG2 population in a similar manner as that in the sIgG1 population. Thus, there must be a population of naive B cells that is the progenitor of the IgG2+ cells. An obvious candidate would be a population of cells expressing homodimeric IgMs. Although the absence of specific reagents precludes formal identification of such a population, the presence of VHH-IgM transcripts in alpaca peripheral B cells strongly suggests its existence. Furthermore, the unmutated state of the V genes and the use of the same VHH, DH, and JH genes in the IgM and IgG2 populations, together with the structural organization of the IgH locus in alpaca shown here, are most consistent with such a possibility. Thus, the rearrangement of a VHH to a DJ is expected to result in expression of the same VHH gene in the form of an IgM. In addition, the switch regions flanking the Cμ and Cγ2 are apparently normal, strongly suggesting that recombination between these two regions is required for the formation of IgG2 mRNA and protein. These data strongly suggest the existence of a population of cells expressing VHH-IgMs that, upon Ag challenge, would switch to IgG2+ cells without differentiating into VHH-IgM+ memory and plasma B cells. To maintain Ag specificity after switching, the VHH-IgM+ cells must lack L chain expression at the cell surface, and it has been suggested that the FR2 substitution that characterizes VHH region genes prevents its assembly with L chains (10, 18, 39, 40). It is of note that the paradigm that an IgH must associate with surrogate L chains to gain transport and signaling competence has been recently challenged (41, 42, 43). An alternative interpretation, namely that camelid B cells could express directly homodimeric sIgG at the earliest stage of B cell development, cannot be formally excluded at the present time. Whether or not their rearranged VHH are subjected to additional diversification by somatic mutation in an Ag-independent manner remains an unresolved question. It is known that post-rearrangement diversification by somatic mutation and/or gene conversion occurs in GALT (gut-associated lymphoid tissue) species to generate their “preimmune” V repertoire (44, 45, 46).
One or two B cell precursors in camelids?
A key question that arises from this analysis is how “Ig-type choice” is ensured during B lymphocyte development for a cell to express either tetrameric or homodimeric IgG. One possibility is that VH and VHH rearrange stochastically in common progenitor cells and that the nature of the first productive rearrangement, VH or VHH, determines B cell fate. An alternative hypothesis would be that the two types of IgGs are expressed by two separate lineages that originate from two independent progenitors in which the choice of V gene rearrangement is targeted. This is somewhat reminiscent of mouse TCRα/δ locus regulation at the time that progenitor cells rearrange their TCRδ V genes, excluding TCRα V genes despite the fact that both type of genes are intermixed in the genome (36). Different B lineages have been suggested as ensuring regulation of the expression of the different Ig clusters found in the genome of jawed fish (shark, skate, and ray) (47, 48, 49, 50). Among these clusters, which are restricted to particular Ig isotypes, some are expressed in conventional Igs (i.e; IgW or IgM) (51) and others are expressed in homodimeric Igs (i.e., the Ig new Ag receptor or IgNAR) (52, 53).
Our study provides tools and framework background to investigate mechanisms governing differential expression of tetrameric and homodimeric IgGs and their regulation during B lymphocyte development.
Acknowledgments
We thank Pablo Pereira, Luis Bruno Barreiro, Gérard Eberl, and Noëlle Doyen for helpful discussion and comments on the manuscript. We also thank Sandrine Chantot-Bastaraud and Arlette Leneveu of the “Hôpital Tenon, Service d’Histologie, Biologie de la Reproduction, Cytogénétique” for help in performing the FISH assay and Thierry Petit of the “Parc Zoologique de La Palmyre” for generously providing alpaca testis.
Disclosures
The authors have no financial conflict of interest.
Footnotes
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
Abbreviations used in this paper: FR, framework region; BLAST, basic local alignment search tool; IMGT, ImMunoGeneTics information system; FISH, fluorescence in situ hybridization; RSS, recombination signal sequence; sIg, surface Ig; S region, switch region.