Abstract
Characterization of Ag-specific BCR repertoires is essential for understanding disease mechanisms involving humoral immunity. This is optimally done by interrogation of paired H chain V region (VH) and L chain V region (VL) sequences of individual and Ag-specific B cells. By applying single-cell high-throughput sequencing on gut lesion plasma cells (PCs), we have analyzed the transglutaminase 2 (TG2)-specific VH:VL autoantibody repertoire of celiac disease (CD) patients. Autoantibodies against TG2 are a hallmark of CD, and anti-TG2 IgA-producing gut PCs accumulate in patients upon gluten ingestion. Altogether, we analyzed paired VH and VL sequences of 1482 TG2-specific and 1421 non–TG2-specific gut PCs from 10 CD patients. Among TG2-specific PCs, we observed a striking bias in IGHV and IGKV/IGLV gene usage, as well as pairing preferences with a particular presence of the IGHV5-51:IGKV1-5 pair. Selective and biased VH:VL pairing was particularly evident among expanded clones. In general, TG2-specific PCs had lower numbers of mutations both in VH and VL genes than in non–TG2-specific PCs. TG2-specific PCs using IGHV5-51 had particularly few mutations. Importantly, VL segments paired with IGHV5-51 displayed proportionally low mutation numbers, suggesting that the low mutation rate among IGHV5-51 PCs is dictated by the BCR specificity. Finally, we observed selective amino acid changes in VH and VL and striking CDR3 length and J segment selection among TG2-specific IGHV5-51:IGKV1-5 pairs. Hence this study reveals features of a disease- and Ag-specific autoantibody repertoire with preferred VH:VL usage and pairings, limited mutations, clonal dominance, and selection of particular CDR3 sequences.
Introduction
Autoimmune diseases are typically characterized by the presence of specific autoantibodies. Abs are soluble Igs consisting of H and L chains that are produced by plasma cells (PCs) as terminally differentiated B cells. Cell-surface Ig together with accessory molecules make up the BCR that allows B cells to specifically recognize Ags. IgG-producing PCs are devoid of cell-surface Ig, whereas IgA- and IgM-producing PCs retain a functional BCR (1, 2). Recognition of autoantigen by BCRs and Abs is considered a key event in adaptive immune responses that can lead to the development of autoimmune disease.
Upon recognition of Ag, and typically with the help of T cells, B cells proliferate and undergo affinity maturation by accumulation of somatic mutations in Ig genes. B cell responses to foreign or self-antigens are characterized by the activation of multiple reactive B cell clones. During the response, there is selection of B cells that are particularly fit to recognize Ag. Interrogation of an autoantibody response is ideally done by large-scale characterization of the BCR repertoire of Ag-specific cells at a single-cell level. This is now feasible with arrival of high-throughput sequencing (HTS) technologies. Recently, analysis of thousands of naive and Ag-experienced single B cells in healthy subjects was reported (3), but so far no studies have been done with knowledge of the Ag involved.
Celiac disease (CD) presents as a disease ideal to pioneer this type of approach in relation to autoimmunity. CD is an autoimmune disorder driven by exposure to dietary gluten proteins that is characterized by highly disease-specific Abs reactive with the enzyme transglutaminase 2 (TG2) and selective killing of enterocytes mediated by immune cells (4). Presence of serum IgA anti-TG2 Abs at high titer is now considered diagnostic in children, and the IgA anti-TG2 Abs are among the autoantibodies with the highest specificity and sensitivity for any autoimmune disease (5). In the celiac lesion of the proximal small intestine there is accumulation of TG2-specific IgA PCs, which, on average, accounts for 10% of the local PCs (6). These cells express cell-surface Ig, allowing isolation of Ag-specific cells from gut biopsies of individual patients by use of labeled TG2. In previous studies, we reported generation of a panel of 63 anti-TG2 mAbs (6), as well as bulk HTS of IGHV genes (7), from TG2-specific PCs. In this study, we have developed a high-throughput protocol for sequencing of H chain V regions (VH) and L chain V regions (VL) of single Ag-specific cells and used it to characterize the anti-TG2 IgA response in CD patients. This study gives detailed insight into important aspects of an autoimmune B cell response, including the nature of clonal expansions and the restricted usage and strong pairing preference of particular VH and VL gene segments.
Materials and Methods
Subjects and cells
Biopsies used for preparing the single-cell suspension were collected from a total of 10 CD patients: 8 untreated consuming a normal diet and 2 treated consuming a gluten-free diet. Diagnosis of all the subjects was done according to the guidelines of the British Society of Gastroenterology (8). Clinicopathological details of all the subjects are given in Table I. TG2-specific serum IgA Ab levels were determined using the Celikey Tissue Transglutaminase IgA kit (Thermo Fisher Scientific). Before collecting biopsies, informed consent was obtained from all patients and the study had been approved by the Regional Ethics Committee of South-Eastern Norway (REK 2010/2720). Duodenal biopsies obtained by endoscopy were collected in ice-cold RPMI 1640 (Sigma-Aldrich). To obtain lamina propria lymphocytes, tissues were digested with collagenase (1 mg/ml; Sigma-Aldrich) in RPMI 1640 with 3% FCS at 37°C for 1 h under constant rotation. Tissue debris was removed by passing the digest through a 40-μM cell strainer, followed by centrifugation and washing two times with RPMI 1640. The single-cell suspension was cryopreserved in RPMI 1640 containing 50% FCS and 10% DMSO until used.
Sorting of single PCs
Recombinant human TG2 produced in Sf9 insect cells with an N-terminal BirA biotinylation site was used for sorting of TG2-specific PCs. Biotinylated TG2 and allophycocyanin-conjugated streptamers (IBA) were mixed at a 4:1 molar ratio in PBS supplemented with 3% FCS and incubated for 1 h on ice to generate TG2 multimers. For staining, single-cell suspensions from gut biopsies were incubated with TG2-streptamer-allophycocyanin conjugate, anti–IgA-FITC (Southern Biotech), anti–CD3-BV570 (BioLegend), anti–CD14-BV570 (BioLegend), anti–CD19-PB (BioLegend), anti–CD27-PE-Cy7 (eBioscience), and anti–Igκ-PerCPCy5.5 (BioLegend) for 45 min on ice. TG2-specific (large CD3−CD14−CD27+CD19+/−IgA+TG2+) and non–TG2-specific (large CD3−CD14−CD27+CD19+/−IgA+TG2−) single PCs were sorted using FACSAria II (BD) into 96-well plates containing 5 μl of catch buffer containing RNase free H2O, 1× first strand buffer (Invitrogen), 800 nM STRT-T30 primer (9) [alone or in combination with 800 nM CACH1 (10) primer], 800 nM barcoded (well-specific) TSO primers (9), 5 mM DTT (Invitrogen), 0.8 U/μl RNasin (Promega), and 0.02% Tween 20 in each well. Immediately after sorting, plates were sealed using plate sealers, centrifuged at 2500 rpm for 1 min, and stored at −70°C until cDNA synthesis.
Single-cell RT-PCR and cDNA purification
Before cDNA synthesis, plates containing single cells were incubated at 72°C for 3 min, centrifuged, and immediately placed on ice. Each well was added with 5 μl of RT mix containing RNase free H2O, 1× first strand buffer, 2 mM dNTPs (Thermo Scientific), 1.6 M betaine (Sigma-Aldrich), 12 mM MgCl2 (Sigma-Aldrich), 0.8 U/μl RNasin, and 4 U/μl SuperScript II Reverse Transcriptase (Invitrogen). For cDNA synthesis, plates were incubated at 42°C for 70 min and 70°C for 10 min. After synthesis, cDNA was purified using Agencourt RNAClean XP beads (Beckman Coulter) and stored at −20°C until further use.
PCR sequencing library preparation
VH, Vκ, and Vλ genes were amplified from cDNA in two rounds of PCR. For the first round, PCR mix, containing H2O, 1× KAPA HiFi HotStart ReadyMix (Kapa Biosystems), 0.25 μM of each primer [STRT For 2, CaCH1-2 (10), IgK GSP1 (11), IgLC Rev (12)] (Supplemental Table I), 0.05 U/μl USER Enzyme (New England Biolabs), and cDNA in a total volume of 20 μl, was subjected to the following conditions: 37°C for 15 min (to facilitate the action of USER enzyme), 95°C for 2 min, 1× (98°C for 15 s, 70°C for 30 s, 72°C for 40 s), 1× (98°C for 15 s, 67°C for 30 s, 72°C for 40 s), 23× (98°C for 15 s, 60°C for 30 s, 72°C for 40 s), and 72°C for 5 min. For the second round, a reaction mixture containing H2O, 1× KAPA HiFi HotStart ReadyMix, 0.25 μM forward primer (R2-STRT), 0.25 μM barcoded primers corresponding to IGHJ, IGKC, and IGLC (Supplemental Table I), and first-round PCR product was subjected to the following conditions: 95°C for 2 min, 1× (98°C for 15 s, 70°C for 30 s, 72°C for 40 s), 1× (98°C for 15 s, 67°C for 30 s, 72°C for 40 s), 13× (98°C for 15 s, 60°C for 30 s, 72°C for 40 s), and 72°C for 5 min. Illumina MiSeq adapter sequences were introduced at both ends of second-round PCR products in the third round of PCR, using Qiagen Multiplex PCR Kit (Qiagen) under PCR conditions: 95°C for 15 min, 10× (95°C for 30 s, 60°C for 45 s, 72°C for 90 s) and 72°C for 10 min. Final amplicon libraries were first concentrated using Agencourt AMPure XP beads (Beckman Coulter), then extracted from agarose gel using QIAquick Gel Extraction Kit (Qiagen), and further purified using QIAquick PCR Purification Kit (Qiagen); then paired-end sequencing of 300 bp was performed using Illumina MiSeq at the Norwegian Sequencing Centre (Oslo, Norway; http://www.sequencing.uio.no).
Processing of raw sequencing data
Quality evaluation of raw reads was first done using FastQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/), and raw data were further processed using pRESTO (repertoire sequencing toolkit) (13). In short, reads with mean Phred quality <30 were removed, followed by removal of reads that did not contain valid primers. Reads were then assembled to a full sequence. Sequences were then marked according to gene-specific primers (IGH/IGK/IGL); well, plate indices, and identical sequences belonging to the same well and plate were collapsed to remove duplicate sequences. The number of sequences collapsing as one sequence was denoted as “dupcount.” Only sequences with ≥2 dupcount were used for further analysis. The collapsed VH and VL sequences were annotated using IMGT HighV-Quest (14). Further data analysis and preparation was done using an in-house–developed sequence analysis pipeline: Immune Receptor Information System (IRIS). Only sequences having number of dupcounts ≥5 were included for the data preparation. In cases where, for a cell, sequences more than one VH:VL pair appeared, the VH/VL sequence with dupcount ≥5-fold higher than others was selected for analysis; otherwise, the corresponding well was assumed to have more than one cell and discarded from the analysis. If a well contained both IGKV and IGLV sequences, then the corresponding cell was assumed to express both chains. Highly similar (<5 nt difference) IGHV/IGKV/IGLV gene segments belonging to the same subgroup were commonly named as IGH/K/L-VX-p (e.g., IGKV1-39-p for IGKV1-39 and IGKV1D-39) to avoid ambiguous IGHV/IGKV/IGLV gene assignments by IMGT. In case ambiguity with regard to IGHV/IGKV/IGLV gene and IGHJ/IGKJ/IGLJ gene assignments still remained, such sequences were not included for the calculation of respective gene usage frequency. Circos plots, depicting the VH:VL pairing frequency, was generated using Circos Table Viewer (15). Plots showing amino acid changes were generated using Seq2Logo (16).
Clonal assignment between different subpopulations
Functional VH and VL sequences from sorted TG2-specific and non–TG2-specific PCs were assigned into clonal groups using Change-O – Repertoire clonal assignment toolkit (17). In short, clones having common IGHV/IGKV/IGLV and IGHJ/IGKJ/IGLJ gene segments, equal junction length, and sequences differing from one another by a threshold distance <0.15 (measured using the hs5f mutability and substitution model) (18) within the junction region were assigned to the same clonal group. The threshold distance was determined by manual inspection of the nearest neighbor histogram generated (applying length-normalized hs5f model) using the SHazaM R package (17–19). Clonal assignment was done separately for VH and VL sequences. A VL sequence was assumed to be part of a clonal group only if the corresponding VH sequence also belonged to the same group.
Generation of lineage trees
Lineage trees for different clonal groups were generated using Alakazam tool (17) after assigning a full-length IMGT-numbered germline sequence to each clone using IMGT reference database of IGHV/IGKV/IGLV and IGHJ/IGKJ/IGLJ gene sequences with a masked (replaced with Ns) D region. Maximum parsimony lineages were inferred using the dnapars application of PHYLIP (20) using the germline sequence as the outgroup. This was followed by recursively replacing inferred ancestors in each tree with descendants having a Hamming distance of zero from their inferred parent. Branch lengths were assigned as the Hamming distance between sequences, that is, the number of unambiguous mutation events.
Similarity analysis
Statistical analysis
For each VH:VL gene segment pairing, a 2 × 2 contingency table was computed. The table contained the count (number of observations) for the particular VH:VL pair, the count for the given VH gene paired with any other VL, the count for the given VL gene paired with any other VH, as well as the count of any other VH gene paired with any other VL gene. Significant deviation from random pairing was assessed using Fisher exact test on each contingency table. The resulting p values were corrected for multiple testing by multiplying each p value with the total number of possible pairings (the number of distinct observed VH genes multiplied by the number of distinct observed V genes). This corresponds to dividing a desired significance level by the number of tests (Bonferroni correction). Significance for specific IGKV1-5:IGKJ combinations was computed in the same manner.
Generation of mAbs
TG2-specific mAbs were expressed as human IgG1 in 293-F cells and purified on Protein G as previously described (24). To obtain the plasmids for expression of λ-containing mAbs, we subcloned synthetic DNA (GenScript) encoding the VH and VL regions of selected PCs into Ig expression vectors between the AgeI and SalI (VH) or the AgeI and XhoI (VL) sites (25). Reversion of IGKV1-5 residue 106 to germline configuration was done by overlap-extension PCR using anti-TG2 mAb 679-14-E06 as template (6). The resulting κ-chain sequence was cloned into the expression vector between the AgeI and HindIII sites.
ELISAs
Recombinant human TG2 produced in insect cells (Phadia) was coated overnight in TBS at 3 μg/ml. For comparison of wild-type and mutant versions of TG2, proteins were produced in Escherichia coli as previously described (24, 26). mAbs were added in various concentrations in TBS containing 0.1% (v/v) Tween 20 (TBST) followed by incubation for 1 h at 37°C. After washing with TBST, bound mAbs were detected using alkaline phosphatase–conjugated rabbit anti-human IgG (Abcam). For competitive ELISAs, coated TG2 was incubated with various concentrations of λ-containing IgG1 mAbs in 100 μl of 3% (w/v) BSA/TBST for 30 min at 37°C. Without removing the IgG1 mAbs, different κ-containing anti-TG2 IgA1 mAbs were added in 10 μl of buffer at 0.2 μg/ml followed by continued incubation for 1 h. After washing with TBST, bound IgA1 was detected using alkaline phosphatase–conjugated goat anti-human IgA (Sigma). Measured OD values were compared with the signals obtained in the absence of competing IgG1 mAbs.
Sequence data
The sequence data presented in this study have been deposited in the National Center for Biotechnology Information Short Read Archive (https://www.ncbi.nlm.nih.gov/sra) under accession numbers PRJNA374803 and PRJNA374798.
Results
Paired analysis of TG2-specific VH and VL repertoire from single cells
Difficulties in isolating Ag-specific PCs have impeded large-scale analysis of authentically paired VH and VL genes. We took advantage of CD and the disease-associated expansion of TG2-specific PCs to generate a large-scale data set of paired VH:VL genes using a single-cell HTS approach. Duodenal biopsies from 10 CD patients, most of whom were untreated, were used for single-cell flow cytometry sorting (Table I). cDNA was synthesized using oligo-DT or a combination of oligo-DT and IgA-specific primers and unbiased template switching (9, 27). VH and VL PCR amplicons were generated and sequenced on the Illumina MiSeq platform. The gene segment usage of each cell could be assigned by use of cell- and sample-specific barcodes. For VL sequencing, the success rate was around 85% of the input cells. The inclusion of IgA-specific primers for cDNA synthesis improved VH sequencing efficiency from 40 to 75% compared with oligo-DT primers, resulting in paired sequence information for around 60% of the input cells with this protocol. Only cells with paired VH and VL sequence information were considered for further analysis. A total of 1482 TG2-specific and 1421 non–TG2-specific PCs were analyzed (Table II). Consistent with earlier findings (7), TG2-specific PC populations contained many clonally related sequences and showed less heterogeneity than non–TG2-specific PCs, indicating a restricted repertoire (Table II).
Patient ID . | Gender . | Age (y) . | Serum Anti-TG2 Titer . | Marsh Score . | HLA-DQ2.5 . | Treatment Statusa . |
---|---|---|---|---|---|---|
CD1245 | F | 33 | <1 | 0 | + | Treated |
CD1256 | F | 19 | 1.1 | 3a | + | Untreated |
CD1257 | F | 27 | >100 | 3b | + | Untreated |
CD1259 | F | 40 | 3 | 3a | + | Untreated |
CD1263 | F | 31 | >100 | 3c | + | Untreated |
CD1320 | F | 44 | 16.2 | 3a | + | Untreated |
CD1322 | F | 22 | 8.8 | 3b | + | Untreated |
CD1332 | F | 36 | <1 | 1 | + | Treated |
CD1338 | F | 20 | 2.8 | 3b | + | Untreated |
CD1356 | F | 21 | N.A. | 3c | + | Untreated |
Patient ID . | Gender . | Age (y) . | Serum Anti-TG2 Titer . | Marsh Score . | HLA-DQ2.5 . | Treatment Statusa . |
---|---|---|---|---|---|---|
CD1245 | F | 33 | <1 | 0 | + | Treated |
CD1256 | F | 19 | 1.1 | 3a | + | Untreated |
CD1257 | F | 27 | >100 | 3b | + | Untreated |
CD1259 | F | 40 | 3 | 3a | + | Untreated |
CD1263 | F | 31 | >100 | 3c | + | Untreated |
CD1320 | F | 44 | 16.2 | 3a | + | Untreated |
CD1322 | F | 22 | 8.8 | 3b | + | Untreated |
CD1332 | F | 36 | <1 | 1 | + | Treated |
CD1338 | F | 20 | 2.8 | 3b | + | Untreated |
CD1356 | F | 21 | N.A. | 3c | + | Untreated |
Treated means patients have previously received a diagnosis of CD and were adherent to a gluten-free diet. Untreated means the patient was seen for the first time and the endoscopy was done as part of the diagnostic procedures for CD. In some cases, the patients could possibly already have reduced their gluten intake while waiting for endoscopy.
F, female; N.A., not available.
. | No. of Single Cells . | No. of Clonotypes . | Heterogeneity (%)a . | |||
---|---|---|---|---|---|---|
Patient ID . | TG2− . | TG2+ . | TG2− . | TG2+ . | TG2− . | TG2+ . |
CD1245 | 128 | 31 | 121 | 22 | 94.5 | 70.9 |
CD1256 | 80 | 99 | 76 | 59 | 95.0 | 59.6 |
CD1257 | 146 | 115 | 138 | 75 | 94.5 | 65.2 |
CD1259 | 105 | 84 | 101 | 56 | 96.2 | 66.6 |
CD1263 | 61 | 62 | 60 | 39 | 98.4 | 62.9 |
CD1320 | 167 | 153 | 136 | 81 | 81.4 | 52.9 |
CD1322 | 287 | 265 | 266 | 151 | 92.6 | 56.9 |
CD1332 | 293 | 278 | 280 | 172 | 95.5 | 61.8 |
CD1338 | 113 | 155 | 109 | 132 | 96.4 | 85.1 |
CD1356 | 41 | 240 | 38 | 101 | 92.6 | 42.0 |
Total | 1421 | 1482 | 1325 | 888 |
. | No. of Single Cells . | No. of Clonotypes . | Heterogeneity (%)a . | |||
---|---|---|---|---|---|---|
Patient ID . | TG2− . | TG2+ . | TG2− . | TG2+ . | TG2− . | TG2+ . |
CD1245 | 128 | 31 | 121 | 22 | 94.5 | 70.9 |
CD1256 | 80 | 99 | 76 | 59 | 95.0 | 59.6 |
CD1257 | 146 | 115 | 138 | 75 | 94.5 | 65.2 |
CD1259 | 105 | 84 | 101 | 56 | 96.2 | 66.6 |
CD1263 | 61 | 62 | 60 | 39 | 98.4 | 62.9 |
CD1320 | 167 | 153 | 136 | 81 | 81.4 | 52.9 |
CD1322 | 287 | 265 | 266 | 151 | 92.6 | 56.9 |
CD1332 | 293 | 278 | 280 | 172 | 95.5 | 61.8 |
CD1338 | 113 | 155 | 109 | 132 | 96.4 | 85.1 |
CD1356 | 41 | 240 | 38 | 101 | 92.6 | 42.0 |
Total | 1421 | 1482 | 1325 | 888 |
Percentage heterogeneity was calculated by dividing the number of clonotypes by total number of cells and then multiplying by 100.
TG2+, TG2-specific; TG2−, non–TG2-specific.
Certain L chain V gene segments are overrepresented among TG2-specific PCs
Similar to biases observed in BCR V gene usage in several autoimmune conditions (28, 29), VH gene usage among TG2-specific PCs from CD patients was previously found to show a strong bias toward a few IGHV gene segments (6, 7). In this study, the analysis was extended to TG2-specific VL sequences. Usage frequency was calculated by taking only unique clonotypes into account to exclude the effect of clonal expansion. In a variety of autoantibodies, including rheumatoid factor and Abs with specificity for dsDNA, phospholipids, histone A2, and laminin, λ L chain was more frequent than κ L chain (28–31). In contrast, TG2-specific PCs expressed κ at higher percentages than non–TG2-specific PCs (Fig. 1A, 1B). In a previous study, done with a small number of sequences, only 5% of TG2-specific PCs were estimated to use λ L chains (6), whereas in this study, 20% of them were λ-expressing. The reason for this difference could reflect patient-to-patient variation or the use of different Ag preparations for staining of PCs in two studies. To test whether the λ-expressing PCs are truly TG2-reactive and verify the specificity of our staining, we selected Ab sequences of two such PCs and expressed them recombinantly as human IgG1. Both mAbs were TG2-specific and found to target epitopes located in the N-terminal or core domain of the enzyme (Supplemental Fig. 1), which is in agreement with the location of epitopes previously assigned to anti-TG2 mAbs (24, 26). One of the mAbs used the IGHV3-48 gene segment that was previously found to be overrepresented among TG2-specific mAbs together with κ L chains (6). Most of the mAbs using IGHV3-48 were found to target the same epitope located around residue Arg19 (26, 32). Importantly, the λ-containing IGHV3-48 mAb was not reactive with a TG2 R19S mutant (Supplemental Fig. 1B), suggesting that the same epitopes can be targeted by κ- and λ-using Abs.
Selective usage of VL gene segments among TG2-specific (TG2+) PCs. (A and B) Averaged percentages of Igκ- and Igλ-expressing PCs determined by sequencing (A) or flow cytometry (B). Each bar graph represents average ± SD calculated from sequence data of 10 patients. (C) VL usage frequency for a representative untreated CD patient (CD1322). (D) Average VL usage frequency for a total of 10 patients. Number of total unique clonotypes denoted by n (C) was taken into account for the usage frequency calculation (C and D). (E) Usage frequency of JH and JL gene segments among TG2+ and non–TG2-specific (TG2−) PCs is comparable. Frequency has been calculated from the sequence data combined for all 10 patients. Significance was determined using a two-tailed t test. *p < 0.05, ***p < 0.001. n.s., not significant.
Selective usage of VL gene segments among TG2-specific (TG2+) PCs. (A and B) Averaged percentages of Igκ- and Igλ-expressing PCs determined by sequencing (A) or flow cytometry (B). Each bar graph represents average ± SD calculated from sequence data of 10 patients. (C) VL usage frequency for a representative untreated CD patient (CD1322). (D) Average VL usage frequency for a total of 10 patients. Number of total unique clonotypes denoted by n (C) was taken into account for the usage frequency calculation (C and D). (E) Usage frequency of JH and JL gene segments among TG2+ and non–TG2-specific (TG2−) PCs is comparable. Frequency has been calculated from the sequence data combined for all 10 patients. Significance was determined using a two-tailed t test. *p < 0.05, ***p < 0.001. n.s., not significant.
The VL gene segments preferentially used by TG2-specific PCs belonged primarily to the IGKV1 gene family. In particular, IGKV1-39 (18.4%) and IGKV1-5 (14.4%) were frequently used (Fig. 1C, 1D). No noticeable difference was observed in the frequency of JH or JL gene usage between TG2-specific and non–TG2-specific PC populations (Fig. 1E).
IGHV5-51 and IGKV1-5 preferentially pair among TG2-specific PCs
To gain insight into the dependence of Ab specificity on VH:VL and to find TG2-specific signatory VH:VL combinations, we calculated the frequency of each VH:VL pair. Only unique clonotypes were taken into account to exclude the influence of clonal expansion on observed pairing frequencies. In general, each VH segment paired only with a fraction of VL segments (Fig. 2A, 2B). A number of VH:VL pairs, including IGHV5-51:IGKV1-5, IGHV1-69:IGKV1-17, and IGHV3-48:IGLV5-45, were found at frequencies significantly higher than expected among TG2-specific PCs, and in some cases they were completely absent among non–TG2-specific PCs (Fig. 2A–C, Table III). This is in agreement with a recent study that showed that, relative to the naive repertoire, some VH:VL pairs were increased or decreased among Ag-experienced B cells (3). Among TG2-specific PCs, the IGHV5-51:IGKV1-5 pair was the most frequent (8.3%) (Fig. 2C). Of all TG2-specific PCs having IGHV5-51, L chain usage was clearly dominated by IGKV1-5 (Fig. 2D, 2E, Table III), indicating that the specificity of these Abs depends on both chains. This is consistent with a previous study showing that some TG2-specific Abs lose binding when native VH:VL pairing is changed (24). Besides IGKV1-5, although at lower frequencies, IGHV5-51 also associated with IGKV1-39 and IGKV3-20 (Fig. 2D, 2E). Although most of the highly frequent VH:VL pairs featured κ L chains, some pairs (e.g., IGHV3-48:IGLV1-47, IGHV3-48:IGLV5-45, and IGHV3-74:IGLV1-44) featuring λ L chain were found exclusively among TG2-specific PCs (Fig. 2C, Table III).
TG2-specific (TG2+) BCRs show strong VH:VL pairing preferences. (A) Circos plot depicting the VH:VL pairings for a representative CD patient (CD1256). Pairing between VH (red arcs) and VL (dark blue arcs) gene segments is shown by the connecting lines inside the circle with thickness corresponding to pairing frequencies. (B) Heat map showing the relative VH:VL pairing frequencies for frequently (>0.4%) observed pairs. The color intensity index for each pair was obtained by dividing the difference in frequency between TG2+ and non–TG2-specific (TG2−) PCs with the highest difference value. Averages of frequency values from 10 patients were used. (C) TG2+ VH:VL gene segment pairs, with observed frequencies significantly higher than expected given random pairing. (D) Pairing frequencies of IGHV5-51 with different L chains. (E) VL gene segments that paired most frequently with IGHV5-51. Each group of bar graphs shows the pairing frequency for the indicated CD patient. Each bar graph shows average ± SD calculated from sequence data of 10 patients (C and D).
TG2-specific (TG2+) BCRs show strong VH:VL pairing preferences. (A) Circos plot depicting the VH:VL pairings for a representative CD patient (CD1256). Pairing between VH (red arcs) and VL (dark blue arcs) gene segments is shown by the connecting lines inside the circle with thickness corresponding to pairing frequencies. (B) Heat map showing the relative VH:VL pairing frequencies for frequently (>0.4%) observed pairs. The color intensity index for each pair was obtained by dividing the difference in frequency between TG2+ and non–TG2-specific (TG2−) PCs with the highest difference value. Averages of frequency values from 10 patients were used. (C) TG2+ VH:VL gene segment pairs, with observed frequencies significantly higher than expected given random pairing. (D) Pairing frequencies of IGHV5-51 with different L chains. (E) VL gene segments that paired most frequently with IGHV5-51. Each group of bar graphs shows the pairing frequency for the indicated CD patient. Each bar graph shows average ± SD calculated from sequence data of 10 patients (C and D).
Combinations . | p Valuea . | Enrichmentb . |
---|---|---|
HV1-3:KV4-1 | 0.045 | 6.08 |
HV1-69-p:KV1-17-p | 0.001 | 5.51 |
HV4-4:KV1-39-p | 0.015 | 2.47 |
HV5-51:KV1-5 | 5.4 × 10−27 | 3.64 |
HV3-74:LV1-44 | 0.003 | 6.96 |
HV3-48:LV5-45 | 1.62 × 10−5 | 6.3 |
HV3-48:LV1-47 | 0.002 | 4.51 |
HV4-34: KV1-39-p | 3.6 × 10−4 | 3.78 |
Combinations . | p Valuea . | Enrichmentb . |
---|---|---|
HV1-3:KV4-1 | 0.045 | 6.08 |
HV1-69-p:KV1-17-p | 0.001 | 5.51 |
HV4-4:KV1-39-p | 0.015 | 2.47 |
HV5-51:KV1-5 | 5.4 × 10−27 | 3.64 |
HV3-74:LV1-44 | 0.003 | 6.96 |
HV3-48:LV5-45 | 1.62 × 10−5 | 6.3 |
HV3-48:LV1-47 | 0.002 | 4.51 |
HV4-34: KV1-39-p | 3.6 × 10−4 | 3.78 |
None of the non–TG2-specific VH:VL gene segment combinations were associated with a significant p value (p < 0.05, Fisher exact test followed by Bonferroni correction). The results are based on unique clonotype sequences combined for all 10 patients.
Observed value divided by expected value according to product model.
Analysis of BCR-repertoire similarity using Morisita–Horn indices demonstrated that TG2-specific VH:VL sequences from different CD patients clustered closely together and had more relatedness than non–TG2-specific VH:VL sequences (Supplemental Fig. 2). This confirms the presence of stereotypic sequences in TG2-specific Ab responses among different individuals.
Certain TG2-specific VH:VL pairs undergo frequent clonal expansion
After Ag encounter, B cell clones with high affinity to specific Ags are selected and undergo clonal expansion. To get an indication of whether frequently found TG2-specific VH:VL pairs are selected for specificity and affinity, we analyzed the extent of clonal expansion among TG2-specific PCs and the propensity with which certain VH:VL pairs expanded in individual CD patients. For establishing true clonality, both VH and VL sequences were taken into account. As would be expected for an Ag-restricted immune reaction, cell/clonotype ratio for all the CD patients was >1, thus reflecting clonal expansion among TG2-specific PCs (Fig. 3A). Large fractions of the TG2-specific clones were found to be expanded, whereas this was not seen for non–TG2-specific clones (Fig. 3B). Clonal families with three or more cells were also detected at noticeably high levels among TG2-specific PCs (Fig. 3C), indicative of a restricted repertoire. Certain VH:VL pairs from TG2-specific PCs showed a tendency to expand, and most of them contained IGKV1-39 L chains (Fig. 3D). Among all the clonally expanded TG2-specific VH:VL pairs, the IGHV5-51:IGKV1-5 pair was present in most patients (Fig. 3E, 3F). Although this VH:VL pair could also be detected among the non–TG2-specific PC population in 3 of 10 patients, these cells did not show any expansion (Fig. 3F). The VH:VL pair associated with most expanded clones differed from patient to patient (Fig. 3G). Interestingly, most of these biggest sized clonal families consisted of IGKV1-39 associated with different VH gene segments (Fig. 3H). This is in agreement with a recent study where IGKV1-39:IGKJ2 rearrangement was indeed found to be very promiscuous among human peripheral blood memory cells (33). In comparison with IGKV1-39, IGKV1-5 did not show a tendency of pairing with as diverse VH gene segments among expanded clones (Fig. 3D, 3E), thus indicating a restricted specificity. Next to IGKV1-39, IGKV3-20 also paired with many different VH gene segments among expanded TG2-specific PCs (Fig. 3D, 3F). This might reflect the observation that IGKV3-20 is among the most used VL gene segments in humans (34).
Clonality among TG2-specific (TG2+) PCs is fairly widespread. (A) Ratio of total sequences (assumed to be cells) and number of clonotypes for individual CD patients (B). Percentage of expanded clonotypes. Each plot corresponds to indicated CD patient. (C) Absolute frequency of differently sized clonal families. (D) Heat map indicating the VH:VL pairs with strong propensity for clonal expansion. The color intensity index for each pair was obtained by dividing the difference in average (n = 10) fold expansion between TG2+ and non–TG2-specific (TG2−) PCs with the highest difference value. (E) Percentage of analyzed patients (n = 10) where plotted VH:VL combinations showed expansion. (F) Observed fold expansion for IGHV5-51:IGKV1-5 bearing PCs in individual CD patients. (G) VH:VL pair expressed by most expanded clones in each individual. The numbers in parentheses show the size of the clonal group (numerator) versus total number of cells (denominator).
Clonality among TG2-specific (TG2+) PCs is fairly widespread. (A) Ratio of total sequences (assumed to be cells) and number of clonotypes for individual CD patients (B). Percentage of expanded clonotypes. Each plot corresponds to indicated CD patient. (C) Absolute frequency of differently sized clonal families. (D) Heat map indicating the VH:VL pairs with strong propensity for clonal expansion. The color intensity index for each pair was obtained by dividing the difference in average (n = 10) fold expansion between TG2+ and non–TG2-specific (TG2−) PCs with the highest difference value. (E) Percentage of analyzed patients (n = 10) where plotted VH:VL combinations showed expansion. (F) Observed fold expansion for IGHV5-51:IGKV1-5 bearing PCs in individual CD patients. (G) VH:VL pair expressed by most expanded clones in each individual. The numbers in parentheses show the size of the clonal group (numerator) versus total number of cells (denominator).
TG2-specific VH and VL carry few mutations
We next analyzed the phylogenetic relationship between cells belonging to individual clonotypes and the extent of mutation among TG2-specific VH and VL sequences. Phylogenetic analysis demonstrated that clones with a particular mutation expanded more than others (Fig. 4), possibly because of increased affinity for TG2. Interestingly, although most of the clones were mutated, a number of them had low mutation levels and a few had acquired only a single mutation (Fig. 4). Notably, we observed that mutation levels in the TG2-specific IGHV and IGKV/IGLV gene segments were significantly lower than those of non–TG2-specific PCs (average 12.5 versus 19.2 for IGHV and 9.2 versus 13.3 for IGKV/IGLV), although some of the TG2-specific PCs did have high levels of mutation (Fig. 5A). The decrease in number of mutations in H and L chains was proportionate (Fig. 5B). The global effect on mutational activity, affecting both H and L chains, suggests that TG2-specific B cells undergo limited somatic hypermutation and possibly spend little or no time in germinal centers before PC differentiation. The higher mutation rate observed among non–TG2-specific PCs is in keeping with earlier findings showing that V region associated with IgA contains a high degree of mutations (6, 35). Thus, the low-level mutation among TG2-specific PCs appears to be unusual for an IgA-expressing PC population.
TG2-specific PCs acquire mutations as they clonally expand. Phylogenetic trees, generated using VH sequences, show the clonal relationship between TG2-specific PCs belonging to indicated CD patients. Associated VH and VL gene segments for each clonal family have been indicated. The numbers in the parentheses denote number of total and nonsilent mutations in cells with identical VH sequences (circles). Importantly, the shape of the trees was deduced from the combined IGHV, IGHD, and IGHJ sequences, whereas the indicated mutation numbers reflect IGHV and IGKV (underlined) only. Cells having identical VH sequences, but differing in IGKV mutation number have been indicated. Size of the circles corresponds to the number of cells with identical VH sequence. Cited text inside the circle represents names of individual cells.
TG2-specific PCs acquire mutations as they clonally expand. Phylogenetic trees, generated using VH sequences, show the clonal relationship between TG2-specific PCs belonging to indicated CD patients. Associated VH and VL gene segments for each clonal family have been indicated. The numbers in the parentheses denote number of total and nonsilent mutations in cells with identical VH sequences (circles). Importantly, the shape of the trees was deduced from the combined IGHV, IGHD, and IGHJ sequences, whereas the indicated mutation numbers reflect IGHV and IGKV (underlined) only. Cells having identical VH sequences, but differing in IGKV mutation number have been indicated. Size of the circles corresponds to the number of cells with identical VH sequence. Cited text inside the circle represents names of individual cells.
TG2-specific (TG2+) PCs acquire few mutations. (A) Total number of mutations in TG2+ (n = 1419) and non–TG2-specific (TG2−) (n = 1351) IGHV, IGKV, and IGLV genes. (B) Correlation between mutation numbers in paired IGHV and IGKV/IGLV. (C) Number of total mutations in TG2+ IGHV5-51 and other (all except IGHV5-51) IGHV gene segments. (D) Number of total mutations in TG2+ IGKV/IGLV gene segments paired with either IGHV5-51 or other (all except IGHV5-51) IGHV gene segments. (E) Number of mutations in TG2+ IGHV5-51 and of paired IGKV/IGLV. (A, C, and D) Horizontal bars show the average value. Number of sequences (from 10 patients) is denoted by n. Significance was determined using unpaired t test. ****p < 0.0001.
TG2-specific (TG2+) PCs acquire few mutations. (A) Total number of mutations in TG2+ (n = 1419) and non–TG2-specific (TG2−) (n = 1351) IGHV, IGKV, and IGLV genes. (B) Correlation between mutation numbers in paired IGHV and IGKV/IGLV. (C) Number of total mutations in TG2+ IGHV5-51 and other (all except IGHV5-51) IGHV gene segments. (D) Number of total mutations in TG2+ IGKV/IGLV gene segments paired with either IGHV5-51 or other (all except IGHV5-51) IGHV gene segments. (E) Number of mutations in TG2+ IGHV5-51 and of paired IGKV/IGLV. (A, C, and D) Horizontal bars show the average value. Number of sequences (from 10 patients) is denoted by n. Significance was determined using unpaired t test. ****p < 0.0001.
We corroborated on mutation frequencies among TG2-specific PCs using IGHV5-51 and found that they contain fewer (average 7.1 versus 13.6) mutations than TG2-specific PCs using other IGHV gene segments (Fig. 5C) (6). It has been proposed that low mutation rate in IGHV5-51 might be a characteristic feature of this gene segment (36). If so, TG2-specific VL paired to IGHV5-51 would be expected to have mutation levels comparable with VL paired to other TG2-specific IGHV genes. Our analysis revealed that this is not the case, because the VL gene segments paired with IGHV5-51 had significantly lower mutation levels than VL gene segments paired with other IGHV genes (average 6.2 versus 9.8) (Fig. 5D). The numbers of mutations in IGHV5-51 and associated VL gene segments were linearly correlated (Fig. 5E). This suggests that the low mutation rate among IGHV5-51 is not inherent to this IGHV, but rather an Ag-specific phenomenon.
Amino acid changes in VH and VL are selective
Given the strong selection for IGHV5-51:IGKV1-5 and assuming that such Abs dock onto the same epitope of TG2, it was of particular interest to analyze the mutational pattern in this VH:VL pair. Previously, molecular dynamics simulation and mutational analysis of the interaction between TG2 and the Fab fragment of the CD-derived mAb 679-14-E06 that carries IGHV5-51:IGKV1-5 indicated involvement of VH residues D62, D64, K82, and S83 and VL residue K56 (24). Consistent with these results, we observed no substitution or conservative substitutions at these positions in the HTS data except for position 64 of VH (Fig. 6A, Supplemental Fig. 3). The 2 aa changes observed for the VL of 679-14-E06 (K45R and Q106H) (24), were observed at a frequency of 4.2 and 49.5%, respectively, among TG2-specific IGKV1-5:IGHV5-51 BCRs (Fig. 6A, Supplemental Fig. 3). The Q106H substitution was significantly more frequent when IGKV1-5 was paired with IGHV5-51 than when it was paired with other IGHV gene segments or when it was used by non–TG2-specific PCs, and the frequency of the Q106H substitution was significantly higher than that in TG2-specific IGKV1-5 paired with non-IGHV5-51 or non–TG2-specific IGKV1-5 (Fig. 6A–D). In a previous study, reversion of two mutations (K45R and Q106H) in VL of 679-14-E06 to germline led to significant reduction in binding to TG2 (6). In this article, we assessed the Q106H alone, observing a slight but significant reduction in binding to TG2 (Fig. 6E), indicating that this mutation, at a position within the CDR3 region, gives a modest, but possibly decisive, increase in Ab affinity.
Amino acid changes in TG2-specific (TG2+) IGHV5-51 and IGKV1-5 are strongly selective. Graphs show the frequency of amino acid changes (y-axis) for TG2+ IGHV5-51 and IGKV1-5 paired to each other (A) at positions (IMGT numbering shown on x-axis) undergoing frequent amino acid changes. (B) Frequency of amino acid changes for TG2+ IGKV1-5 paired to VH gene segments other than IGHV5-51 and non–TG2-specific (TG2−) IGKV1-5 (C). Height of each letter (amino acid code) corresponds to relative frequency, and amino acid residues with similar physicochemical properties are depicted in the same colors. Number of unique sequences subjected to this analysis is denoted by n. Letters below the numbers on the x-axis show the corresponding germline amino acid residues. According to IMGT numbering, positions 27–38, 56–65, and 105 correspond to CDR1, CDR2, and the first residue of CDR3, respectively. Underlined amino acid residues correspond to ones that were predicted to be engaged in binding between TG2 and the Fab fragment of anti-TG2 mAb 679-14-E06 (24). The second tier of letters below the x-axis shows the amino acid changes observed for 679-14-E06 (24). (D) Frequency of Q to H replacement mutation at position 106 of IGKV1-5 when paired to IGHV5-51 or VH gene segments other than IGHV5-51 (others). Each dot (number of unique sequence n = 4–18) represents one individual CD patient except for TG2− IGKV1-5 paired to IGHV5-51, where an average of all is shown (number of unique sequence n = 5). (E) Role of IGKV1-5 residue 106 in TG2 binding. The prototype anti-TG2 mAb 679-14-E06 using the IGHV5-51:IGKV1-5 pair was expressed either in its native form (106H) or with VL residue 106 reverted to the germline configuration (106Q). Binding of the mAbs to TG2 was assessed by ELISA. EC50 values were obtained by nonlinear regression analysis, and the 95% confidence intervals are given in parentheses. Data from two independent experiments are shown with error bars indicating SD. Significance in (D) was determined using one-way ANOVA. ***p < 0.001, ****p < 0.0001. n.s., not significant.
Amino acid changes in TG2-specific (TG2+) IGHV5-51 and IGKV1-5 are strongly selective. Graphs show the frequency of amino acid changes (y-axis) for TG2+ IGHV5-51 and IGKV1-5 paired to each other (A) at positions (IMGT numbering shown on x-axis) undergoing frequent amino acid changes. (B) Frequency of amino acid changes for TG2+ IGKV1-5 paired to VH gene segments other than IGHV5-51 and non–TG2-specific (TG2−) IGKV1-5 (C). Height of each letter (amino acid code) corresponds to relative frequency, and amino acid residues with similar physicochemical properties are depicted in the same colors. Number of unique sequences subjected to this analysis is denoted by n. Letters below the numbers on the x-axis show the corresponding germline amino acid residues. According to IMGT numbering, positions 27–38, 56–65, and 105 correspond to CDR1, CDR2, and the first residue of CDR3, respectively. Underlined amino acid residues correspond to ones that were predicted to be engaged in binding between TG2 and the Fab fragment of anti-TG2 mAb 679-14-E06 (24). The second tier of letters below the x-axis shows the amino acid changes observed for 679-14-E06 (24). (D) Frequency of Q to H replacement mutation at position 106 of IGKV1-5 when paired to IGHV5-51 or VH gene segments other than IGHV5-51 (others). Each dot (number of unique sequence n = 4–18) represents one individual CD patient except for TG2− IGKV1-5 paired to IGHV5-51, where an average of all is shown (number of unique sequence n = 5). (E) Role of IGKV1-5 residue 106 in TG2 binding. The prototype anti-TG2 mAb 679-14-E06 using the IGHV5-51:IGKV1-5 pair was expressed either in its native form (106H) or with VL residue 106 reverted to the germline configuration (106Q). Binding of the mAbs to TG2 was assessed by ELISA. EC50 values were obtained by nonlinear regression analysis, and the 95% confidence intervals are given in parentheses. Data from two independent experiments are shown with error bars indicating SD. Significance in (D) was determined using one-way ANOVA. ***p < 0.001, ****p < 0.0001. n.s., not significant.
TG2-specific IGHV5-51:IGKV1-5 pairs display length bias in their CDR3 sequences
We also analyzed CDR3 length and observed uneven distribution with striking overrepresentation of lengths of 14 and 16 aa for the H chain and 11 aa in the κ L chain when TG2-specific PCs were compared with non–TG2-specific PCs (Fig. 7A). This peculiar distribution skewing was rooted in the bias for IGHV5-51 and IGKV1-5 among TG2-specific PCs (Fig. 7B, 7C). Notably, PCs carrying the IGHV5-51:IGKV1-5 pair had a strong dominance of 14-aa-long CDR3s in the H chain and 11-aa-long CDR3s in the κ L chain (Fig. 7A–C). Moreover, 11-aa-long L chain CDR3 were biased toward usage of the IGKJ2 gene segment (Fig. 7D). The observed biases are not due to clonal expansion because the frequency calculation was based on number of unique clonotypes. The patterns are striking and hardly coincidental, and they are likely dictated by the epitope interaction. However, a meaningful interpretation will require detailed x-ray crystal structures of prototype Abs in complex with the TG2 Ag.
Bias in CDR3 lengths among TG2-specific (TG2+) PCs. (A) VH and VL CDR3 length distribution among TG2+ and non–TG2-specific (TG2−) PCs as calculated from the sequence data combined for all (n = 10) patients. (B) Frequency of CDR3 amino acid lengths among TG2+ IGHV5-51 paired to IGKV1-5 or other L chain V genes (others) and TG2− IGHV5-51. (C) Frequency of CDR3 amino acid lengths among TG2+ IGKV1-5 paired to IGHV5-51 or other H chain V genes (others) and TG2− IGKV1-5. (D) Frequency of IGKJ gene usage by TG2+ IGKV1-5 containing 9- or 11-aa-long CDR3 and paired to IGHV5-51 or paired to other H chain V genes or by TG2− IGKV1-5. TG2+ IGKV1-5 containing 11-aa-long CDR3 combined to IGKJ2 and IGKJ3 at significantly higher frequency than expected (p = 3.9 × 10−6 and p = 0.018, respectively). Frequencies have been calculated from the sequence data combined for all 10 patients. Calculations are based on number of unique clonotypes denoted by n (B–D).
Bias in CDR3 lengths among TG2-specific (TG2+) PCs. (A) VH and VL CDR3 length distribution among TG2+ and non–TG2-specific (TG2−) PCs as calculated from the sequence data combined for all (n = 10) patients. (B) Frequency of CDR3 amino acid lengths among TG2+ IGHV5-51 paired to IGKV1-5 or other L chain V genes (others) and TG2− IGHV5-51. (C) Frequency of CDR3 amino acid lengths among TG2+ IGKV1-5 paired to IGHV5-51 or other H chain V genes (others) and TG2− IGKV1-5. (D) Frequency of IGKJ gene usage by TG2+ IGKV1-5 containing 9- or 11-aa-long CDR3 and paired to IGHV5-51 or paired to other H chain V genes or by TG2− IGKV1-5. TG2+ IGKV1-5 containing 11-aa-long CDR3 combined to IGKJ2 and IGKJ3 at significantly higher frequency than expected (p = 3.9 × 10−6 and p = 0.018, respectively). Frequencies have been calculated from the sequence data combined for all 10 patients. Calculations are based on number of unique clonotypes denoted by n (B–D).
Discussion
In this study, we describe isolation and large-scale single-cell sequencing of Ig VH and VL genes of TG2-specific PCs from the CD gut lesion. The results reveal common patterns across patients with a striking preferential VH:VL pairing, dominance of certain VH:VL pairs, and significant clonal expansions with the largest expansions most often seen among PCs with the predominant VH:VL pair. Still, in some individuals, more unique VH and VL gene segments and VH:VL pairs were observed among the most abundant clones. Thus, the autoantibody response in CD appears strongly stereotypic, although some individual variation is also seen.
The basis for this study is the observation that single TG2-specific IgA- and IgM-expressing PCs can be isolated with TG2 Ag as bait by taking advantage of the expression of surface Ig by PCs (1, 2). Expression cloning and testing of mAbs from 63 single PCs revealed a specificity of 90% in the selection of TG2-specific PCs (6). This high selection efficiency is an important prerequisite for the analysis undertaken in this study.
Technological advances now allow paired sequencing of Ig VH and VL genes from large populations of single B cells (3, 11, 33, 37, 38). Many of the really high-throughput methods, like emulsion droplet-based processing (3, 33), however, only succeed in the analysis of a fraction of the input cells. In settings with analysis of Ag-specific cells, the number of cells available for analysis is usually scarce, and it is important to have methods that successfully report paired Ig VH and VL sequences for a high fraction of the cells. With a paired sequencing efficiency of around 60%, our method reasonably succeeds in this requirement.
One of the most striking features of the anti-TG2 BCRs that emerge from our analysis is the stereotypic nature of the response in both VH and VL gene usage, as well as in VH:VL pairing. The stereotypic VH gene usage in the anti-TG2 response is already established (6, 7, 39). Some evidence was obtained for a stereotypic VL response as well, but the finding was based on analysis of a limited number of cells chiefly from one subject, thus limiting the strength of the observation (6). This study establishes without doubt that there is also a clear stereotypic response in VL gene usage. Our findings also demonstrate a remarkable pairing preference of certain VH and VL genes. The mAbs generated from single PCs of CD patients recognize four major epitopes clustered in the N-terminal region of TG2 (epitopes 1–4), and there is a strong correlation between VH gene usage and the epitope specificity of the Abs (32). A previous detailed analysis of the reactivity of a single IGHV5-51:IGKV1-5–encoded mAb recognizing epitope 1 revealed that the epitope recognition involved both VH and VL residues (24). Of the VH:VL pairs we observed in this study, IGHV5-51:IGKV1-5 was by far the most strongly preferred. The pairing preference thus likely relates to recognition of TG2 where both VH and VL residues are involved. It could also relate to overall protein stability of Abs, because certain IGHV and IGKV/IGLV segments have superior fitness for each other (40). B cells expressing such stable VH:VL pairs will have a competitive advantage during their development, which could explain domination of VH:VL pairs in the repertoire of naive B cells. We used as reference test population non–TG2-specific PCs, which are not fully representative of the naive repertoire, but because IGHV5-51:IGKV1-5 and many of the other preferred pairs were not prevalent among non–TG2-specific PCs, it is unlikely that the bias we observe is dictated to a large extent by protein stability.
The stereotypic anti-TG2 response is further underscored by the calculated Morisita–Horn indices. When analyzing VH and VL sequences and comparing different donors, as well as TG2-specific and nonspecific PCs, we found that TG2-specific PCs of different individuals showed the highest Morisita–Horn indices, indicating closest relatedness.
Another striking feature we observed is the presence of clonal expansions among TG2-specific PCs. Indication of clonal expansion among TG2-specific cells was previously observed based on HTS analysis of pools of cells (7). This study extends this observation because it provides detailed information from single cells. Clonal expansions were typically seen among cells with the preferred VH:VL pairs. For instance, the IGHV5-51:IGKV1-5 pair was frequent among the expanded clones in 80% of the patients. Genealogical analysis revealed that some of the clonal trees were fairly large, with some PCs acquiring a significant number of mutations, but there were also smaller trees that typically had nodes with several replicates of identical BCRs among the single-cell–sorted PCs. Importantly, we observed that the degree of mutations in the H chain was paralleled by the degree of mutations in the L chain. This was the case also for IGHV5-51, suggesting that the low mutation rate of IGHV5-51 is related to the developmental fate of the B cells rather than the inherent propensity of this H chain gene segment to acquire few mutations.
Comparing the mutational pattern in mAbs that dock on the same epitope should provide insights into critical epitope–paratope interactions. Being the most frequent among the collected BCRs, we chose to investigate the HV5-51/KV1-5 pair in more detail. Reactivity of the celiac mAb 679-14E06, which uses this VH:VL pair, was previously studied in detail by small-angle X-ray scattering, dynamic modeling, and mutational analysis (24). The 679-14-E06 mAb is specific for epitope 1, and several key epitope–paratope interaction residues were identified. This analysis of mutations confirms and extends the previous analysis. Residue 106 of the L chain, despite carrying a mutation from Q to H in the 679-14-E06 mAb, was not identified in the previous study. A uniform mutation pattern with half of the BCRs carrying H strongly suggests that H at this position makes contact with TG2. Another striking feature we observed is the conservative pattern of mutations. In general, there were few mutations, and when a mutation occurred, this was usually to a residue with similar functional physicochemical properties. This suggests a good fit for the epitope in germline configuration by IGHV5-51:IGKV1-5 with little drive to mutate away from the mode of interactions enabled by the germline-encoded residues.
Taken together, this study of paired VH and VL sequences derived from single Ag-specific PCs of a disease lesion unravel key features of a highly disease-specific autoantibody response in a human condition. These results motivate this type of analysis to be undertaken in other human autoimmune diseases.
Acknowledgements
We thank Vikas K. Sarna for providing clinicopathological details of investigated CD patients. We thank the CD patients for donating biological material, and the nursing staff and endoscopist at the Endoscopy Unit, Department of Gastroenterology, Oslo University Hospital-Rikshospitalet for invaluable assistance.
Footnotes
This work was supported by the European Commission (Grant ERC-2010-Ad-268541), the Research Council of Norway through its Centres of Excellence funding scheme (Project 179573/V40), Stiftelsen KG Jebsen, and grants from the South-Eastern Norway Regional Health Authority.
The sequence data presented in this article have been submitted to the National Center for Biotechnology Information Short Read Archive (https://www.ncbi.nlm.nih.gov/sra) under accession numbers PRJNA374803 and PRJNA374798.
The online version of this article contains supplemental material.
References
Disclosures
The authors have no financial conflicts of interest.