B cell clonal expansion is vital for adaptive immunity. High-throughput BCR sequencing enables investigating this process but requires computational inference to identify clonal relationships. This inference usually relies on only the BCR H chain, as most current protocols do not preserve H:L chain pairing. The extent to which paired L chains aids inference is unknown. Using human single-cell paired BCR datasets, we assessed the ability of H chain–based clonal clustering to identify clones. Of the expanded clones identified, <20% grouped cells expressing inconsistent L chains. H chains from these misclustered clones contained more distant junction sequences and shared fewer V segment mutations than the accurate clones. This suggests that additional H chain information could be leveraged to refine clonal relationships. Conversely, L chains were insufficient to refine H chain–based clonal clusters. Overall, the BCR H chain alone is sufficient to identify clonal relationships with confidence.

B cell–mediated immunity relies on Ig Abs produced as a result of B cell clonal expansion. A BCR is the membrane-bound form of an Ab and is made up of H and L chains paired in a heterodimeric fashion. Each chain contains a variable (V) region, and together, the V regions from the H and L chains form the Ag-binding sites. The V regions are formed via V(D)J recombination. In human, this shuffling process brings together one gene each from numerous IGHV, IGHD, and IGHJ genes for the H chain V (VH) region and one gene each from either IGKV and IGKJ genes or IGLV and IGLJ genes for, respectively, the κ or the λ L chain V (VL) region. Enzyme-mediated editing of the V(D)J junctions and the pairing of H and L chains inject additional diversity (1). During adaptive immune responses, B cells proliferate and further diversify via somatic hypermutation (SHM), forming clones consisting of cells that originated from the same V(D)J recombinant events yet whose BCRs differ at the nucleotide level. As a result, each BCR is largely unique, with recent estimates suggesting 1 × 1016–1018 unique paired Abs in the circulating repertoire (2).

Adaptive immune repertoire receptor sequencing allows for high-throughput profiling of the diverse BCR repertoire via full-length V(D)J sequencing in bulk (3). An ensuing challenge is to computationally infer B cell clonal relationships (4). This step is of great importance as the assessment of repertoire properties such as diversity (5) depends on the proper identification of clones, as does the reconstruction of B cell clonal lineages (6) for tracing isotype switching (7) and Ag-specific (8) Abs. To infer clones, differences at the sequence nucleotide level, especially the high diversity in the CDR3 region, can serve as “fingerprints” (9). Likelihood-based (10) and distance-based (1114) approaches exist. For instance, cells sharing the same IGHV and IGHJ genes and whose H chain junctional sequences are sufficiently similar based on a fixed (1113) or adaptive (14) distance threshold may be clustered as clones. For validation, existing methods used simulated and experimental H chain sequences (10, 13, 14), measuring the fractions of sequences inferred to be clonally unrelated and related of being, respectively, truly unrelated and related (specificity and sensitivity). Recently, Nouri and Kleinstein (14) reported both metrics at over 96% based on simulated data.

The majority of current BCR repertoire studies utilize bulk sequencing (15), during which VH:VL pairing is lost (16). In the absence of VH:VL pairing, computational methods for identifying clones have focused on H chain BCR data. This is justified under the assumption that H chain junctional diversity alone should be sufficiently high such that, even without L chains, the likelihood of clonally unrelated cells being clustered together will be negligibly small (13). This reasoning has yet to be rigorously tested with experimental data. Recent breakthroughs in single-cell BCR sequencing technology have enabled the recovery of native VH:VL pairing (17, 18). We now have the opportunity to investigate the extent to which the inclusion of L chains impacts the ability to accurately detect B cell clonal relationships.

Using single-cell VH:VL paired BCR data, we assessed the performance of H chain–based computational methods for identifying clones by measuring the extent to which the inferred clonal members expressed consistent L chains sharing the same V and J genes and junction length. We conclude that clonal members of the majority of the inferred clones exhibited L chain consistency. For the majority of the accurately inferred H chain–based clones, L chain information did not lead to further clonal clustering with greater granularity. At least some of the information gained from paired L chain data were apparent when considering the pattern of shared mutations in the H chain V segment, which is not considered in current distance–based clonal clustering methods, thus offering the potential for further improvements in H chain–based clonal inference.

Four human datasets (Supplemental Fig. 1A) published by 10x Genomics for public use on August 1, 2018 were accessed (https://support.10xgenomics.com/single-cell-vdj/datasets) on November 3, 2018. Two datasets were sorted and produced by direct Ig enrichment of, respectively, CD19+ B cells isolated from PBMCs from a healthy donor, and GM12878 B lymphoblastoid cell line. They contain VH:VL paired reads for individual cells. The other two datasets were unsorted and produced by V(D)J+5′ gene expression profiling of, respectively, PBMCs from a healthy donor and squamous non–small cell lung carcinoma (NSCLC) cells from a fresh surgical resection. These contain gene expression measurements and Ig enrichment with VH:VL pairing. All datasets were outputted by 10x Genomics via Cell Ranger (v2.2.0). We used “filtered contigs” and “filtered gene-barcode matrices” for Ig and gene expression, respectively.

A fifth dataset (Supplemental Fig. 1B) from (19) contains BCR contigs covering full-length V(D)J segments reconstructed from single-cell RNA sequencing of FACS-sorted CD19+ B cells from six food-allergic individuals (19).

Germline V(D)J gene annotation was performed using IMGT/HighV-QUEST and IgBLAST (v1.10.0). The germline reference used was IMGT release 201839-3. The 10x Genomics datasets also contained annotations by Cell Ranger. IMGT/HighV-QUEST annotations were used as final annotations postfiltering.

For all datasets, only productively rearranged BCR contigs with valid V and J gene annotations, consistent chain annotation (excluding such contigs with IGHV and IGK/LJ), and junctions with nucleotide lengths being a multiple of three were used. A contig must meet all abovementioned criteria based on annotations from all programs used. Furthermore, only cells with exactly one H chain contig paired with at least one L chain contig were examined. From the two unsorted 10x Genomics datasets with gene expression, we considered only cells displaying a transcriptomic profile consistent with being a B cell. Taking into account the high dropout rate of single-cell RNA sequencing, B cells were defined as any cell with nonzero log-normalized expression for any one of the following genes: pan-B cell markers CD19, CD24, and CD72 (20), plasmablast markers CD38 and MKI67 (21), and the isotype-encoding genes IGHA1, IGHA2, IGHD, IGHE, IGHG1, IGHG2, IGHG3, IGHG4, and IGHM.

For each dataset, on a per-subject basis, we identified clones using distance-based methods. We used, separately, spectral clustering (SCOPer v0.1.999) (14) and hierarchical clustering (13) (Change-O v0.4.3) (22). Both methods first partitioned cells into groups sharing the same combination of IGHV gene, IGHJ gene, and H chain junction length (H chain V-J-junction-length [VJL] combination), in which junction is defined as the IMGT-numbered codon 104 (conserved Cys) to codon 118 (conserved Phe/Trp) (23). Within each group, based on distances among the H chain junction sequences, a threshold was used to cluster cells within that group into clones. For spectral clustering, adaptive thresholds were chosen by an unsupervised machine learning algorithm. For hierarchically clustering, a subject-specific, fixed threshold was chosen upon inspection of distance-to-nearest-neighbor plots (Supplemental Fig. 1C) (13).

The number and frequency of nucleotide mutations were calculated based on IGH/K/LV positions leading up to the junction region using the “calcObservedMutations” function from SHazaM (v0.1.10) (22). To calculate the number of IGHV mutations shared pairwise between cells from the same clone, we counted the number of positions at which mutations involving the same nucleotide change were observed in both cells.

We performed clonal relationship inference for five single-cell, VH:VL paired, human BCR datasets, using only the H chain sequence from each cell. The datasets included four publicly available ones from 10x Genomics and one described by (19) (2Materials and Methods). Of these, the B lymphoblastoid GM12878 cell line dataset served as a positive control, as any cells present in a cell line culture can be expected to comprise genetically identical clonal members. A distance-based spectral clustering method (14) was applied to identify clones for each dataset. The datasets contained between 3 and 157 nonsingleton clones (i.e., clones containing at least two cells) (Table I). Because of the small number of clones in each of the six food-allergic individuals in the dataset from (19), we aggregated those results for display after performing analysis on a per-subject basis.

Table I.
Numbers of H chain–based clones and clone sizes
DatasetTotal Number of ClonesNumber of Nonsingleton ClonesClone Size of Nonsingleton Clones (Number of Clones)
CD19+ B cells 8268 157 2 (136); 3 (15); 4 (4); 5 (2) 
B cells from NSCLC tumor 1247 98 2 (81); 3 (9); 4 (4); 6 (2); 8 (1); 14 (1) 
B cells from food-allergic individuals 952 12 2 (10); 3 (1); 5 (1) 
B cells from healthy PBMCs 1105 2 (6); 3 (2) 
GM12878 cell line 2 (1); 5 (1); 790 (1) 
DatasetTotal Number of ClonesNumber of Nonsingleton ClonesClone Size of Nonsingleton Clones (Number of Clones)
CD19+ B cells 8268 157 2 (136); 3 (15); 4 (4); 5 (2) 
B cells from NSCLC tumor 1247 98 2 (81); 3 (9); 4 (4); 6 (2); 8 (1); 14 (1) 
B cells from food-allergic individuals 952 12 2 (10); 3 (1); 5 (1) 
B cells from healthy PBMCs 1105 2 (6); 3 (2) 
GM12878 cell line 2 (1); 5 (1); 790 (1) 

To assess the extent to which H chain–based clonal clustering captures the underlying biological truth of B cell clonal relationships, we examined whether cells clustered into the same clone based on their H chains alone expressed consistent L chains. Specifically, within the same clone, cells that are truly clonally related should carry L chains comprised of the same combination of IGK/LV gene, IGK/LJ gene, and junction sequences of identical lengths (hereafter referred to as the VJL combination). An inferred clone was considered accurate if all of its clonal members carried L chains with the same VJL combination and “misclustered” otherwise. Using spectral clustering with adaptive thresholds, 83–97% of the inferred clones were accurate (Fig. 1A). Another distance-based hierarchical clustering method using a fixed distance threshold (13) yielded similar results (Fig. 1B), and therefore, we focus on presenting results from spectral clustering hereafter. To test the possibility that the observed accuracy arose by chance, we randomly permuted the VH:VL pairings of the cells while maintaining their H chain–based clustering structures. Across 100 permutations, only 1–6% (SD 1–8%) of the inferred clones were accurate by chance (Fig. 1A). Overall, these results show that H chain–based clonal clustering can determine clonal relationships with reasonable confidence (>80%) in terms of L chain consistency.

FIGURE 1.

Performance of H chain–based (A) spectral clustering with adaptive threshold and (B) hierarchical clustering with fixed threshold. Solid and shaded bars show percentages of nonsingleton inferred clones in which clonal members all carried L chains with the same VJL combination. Numbers at the top indicate the denominators. A background distribution was generated by permuting VH:VL pairings 100 times while maintaining inferred clonal lineage structures. Hollow bars show average percentages of accurate clones across permutations with SEs. * denotes empirical one-sided, Bonferroni-corrected p value <0.05.

FIGURE 1.

Performance of H chain–based (A) spectral clustering with adaptive threshold and (B) hierarchical clustering with fixed threshold. Solid and shaded bars show percentages of nonsingleton inferred clones in which clonal members all carried L chains with the same VJL combination. Numbers at the top indicate the denominators. A background distribution was generated by permuting VH:VL pairings 100 times while maintaining inferred clonal lineage structures. Hollow bars show average percentages of accurate clones across permutations with SEs. * denotes empirical one-sided, Bonferroni-corrected p value <0.05.

Close modal

We next investigated the possibility that the observed level of confidence was deflated by factors unrelated to the clonal clustering method itself. We considered the possibility that the misclustered clones (Supplemental Fig. 2A) arose because of erroneous barcoding during sequencing preparation, which has the potential to link together H and L chains from unrelated cells. We reasoned that incorrectly paired H and L chains would show a decreased correlation in their SHM frequencies relative to correctly paired chains. Thus, we computed the Pearson correlation coefficient between IGHV and IGK/LV mutation frequencies for cells expressing nonmajority L chain VJL combinations from misclustered clones. We found no significant difference (p = 0.926, Fisher r-to-z transformation and z test) between the levels of correlation for misclustered clones (0.761) and for accurate clones (0.769), suggesting that erroneous barcoding was unlikely a concern. In addition, we considered the possibility of poor germline V/J gene annotation for the L chains leading to a false appearance of L chain inconsistency. Because higher SHM frequencies are associated with increasing V(D)J annotation errors, we compared the L chain mutation frequency in misclustered and accurate clones. We found that the average IGK/LV mutation frequency across cells expressing nonmajority L chain VJL combinations in misclustered clones was not significantly higher than that across cells in accurate clones (p = 0.957; Supplemental Fig. 2B). Overall, these results based on the analysis of SHM suggest that the observed confidence for accurately identifying clones using H chain–based clonal clustering was not deflated by a false appearance of L chain inconsistency created by erroneous barcoding or incorrect L chain germline gene annotation.

Current distance-based, H chain–based clonal clustering methods use information confined to VJL combination and distances between junction sequences to identify clonally related sequences. We investigated the characteristics of the H chains of cells from misclustered clones to determine whether there was additional information in the H chains that could improve the clustering. It has been noted that shorter H chain junction lengths rendered lower and possibly insufficient diversity for effectively distinguishing clonal members from nonclonal ones (13). However, we found no significant difference between the H chain junction lengths of accurate clones and those of misclustered clones (p = 0.810, Fig. 2A, Supplemental Fig. 2C). Thus, it is unlikely that H chain junction length could serve as an effective indicator for misclustered clones.

FIGURE 2.

Characteristics of misclustered clones. (A) H chain junction lengths of accurate clones versus misclustered clones. (B) The maximum pairwise distance between H chain junction sequences in accurate clones versus the minimum pairwise distance between cells expressing inconsistent L chains in misclustered clones. (C) The minimum pairwise shared IGHV mutations in accurate clones versus the maximum pairwise shared IGHV mutations between cells expressing inconsistent L chains in misclustered clones. Analysis was performed for each dataset separately (Supplemental Fig. 2C–E), and a Fisher combined p value was calculated.

FIGURE 2.

Characteristics of misclustered clones. (A) H chain junction lengths of accurate clones versus misclustered clones. (B) The maximum pairwise distance between H chain junction sequences in accurate clones versus the minimum pairwise distance between cells expressing inconsistent L chains in misclustered clones. (C) The minimum pairwise shared IGHV mutations in accurate clones versus the maximum pairwise shared IGHV mutations between cells expressing inconsistent L chains in misclustered clones. Analysis was performed for each dataset separately (Supplemental Fig. 2C–E), and a Fisher combined p value was calculated.

Close modal

A central component of distance-based clonal clustering is the choice of a distance threshold that determines how dissimilar the junction sequence can be before it is unlikely for cells to be clonally related. To determine whether a better choice of distance threshold had the potential to correct misclustered clones, we compared the maximum pairwise distance between H chain junction sequences of cells in accurate clones with the minimum pairwise distance between cells carrying L chains with different VJL combinations in misclustered clones. Cells with inconsistent L chains in misclustered clones had significantly more dissimilar H chain junction sequences compared with cells in accurate clones (p = 5.6 × 10−9, Fig. 2B, Supplemental Fig. 2D). This implies that some of the misclustered clones could have been corrected by using a numerically lower (stricter) distance threshold and that this lower threshold would not break apart the accurate clones.

True B cell clones are expected to share mutations resulting from SHM followed by clonal expansion and/or positive selection (12, 14). Hershberg and Luning Prak (12) suggested a minimum threshold of four shared mutations for inferring clones. We investigated whether cells in misclustered clones shared fewer mutations in their IGHVs compared with cells in accurate clones. To do so, we compared the minimum number of shared mutations between cells in accurate clones with the maximum number of shared mutations between cells carrying L chains with different VJL combinations in misclustered clones. We found that cells carrying inconsistent L chains in misclustered clones shared significantly fewer mutations compared with cells in accurate clones (p = 2.8 × 10−5, Fig. 2C, Supplemental Fig. 2E). Using Hershberg and Luning Prak’s (12) threshold of four shared mutations, 26 of the 30 misclustered clones (87%) would not have been clustered together based on their H chains (thus increasing specificity). In contrast, within four of these misclustered clones, a subset of cells with consistent L chains would also become separated (thus potentially reducing sensitivity). Although the tradeoff between sensitivity and specificity needs further investigation, these results suggest that the extent of shared mutations in the IGHV is a potentially useful characteristic to consider in distance-based clonal clustering methods.

Given the availability of paired L chains in the single-cell datasets that we analyzed, we assessed the value added from that information. For misclustered clones (Supplemental Fig. 2A), this was trivial as any L chain inconsistency was immediately resolved by regrouping the cells into smaller clusters based on their L chain VJL combinations. For accurate clones, although the cells express consistent L chains, it is possible that these clusters may still contain multiple true clones grouped together. We thus investigated the extent to which further clustering cells based on the similarity of their L chain junctions (analogous to H chain–based clustering) would further split the accurate clones. When clustering using the H chains, a threshold around 0.2 normalized Hamming distance tended to separate clonally related cells from unrelated ones (Supplemental Fig. 1C). However, applying this same threshold to cluster L chains within accurate clones added virtually no information. The L chain junction regions of cells in accurate clones were highly similar and significantly more so compared with their H chain counterparts (Bonferroni-corrected p values < 0.001) (Supplemental Fig. 2F). In all but four of the accurate clones, the L chain junction sequence of a clonal member was at most 0.2 normalized Hamming distance away from the junction sequence of another clonal member most similar to itself (Supplemental Fig. 2F). In other words, clonal members tended to carry L chains with junction sequences that were at least 80% similar to each other.

We next investigated whether there would be further clustering based on L chain junctions at lower distance thresholds ranging, in increasing order of stringency, from 0.15 to 0.05 normalized Hamming distance while bearing in mind that one could always artificially yield further clustering by imposing an increasingly stricter clustering threshold. At each clustering threshold, we determined the percentage of H chain–based clones that were further clustered on the basis of distances between their L chain junction sequences (Table II). On average, 5% of the H chain–based accurate clones inferred via spectral clustering were further clustered at 0.15, the most lenient threshold explored. Even at 0.05, the strictest threshold explored, only 23.2% of the accurate clones were further clustered. This threshold is approaching the mean L chain SHM frequency, which ranges from 0.01 to 0.05 across the datasets, raising questions as to whether such further clustering is artificial rather than biological. Overall, L chain information does not support clonal clustering with greater granularity for the majority of H chain–based accurate clones.

Table II.
Percentage of accurate H chain–based clones further clustered based on L chains
Clustering Threshold for L Chain Junctions0.050.100.15
DatasetPercentage of H Chain–Based Clones Undergoing Further Clustering
CD19+ B cells 20.3 6.8 3.0 
B cells from NSCLC tumor 32.6 18.9 16.8 
B cells from food-allergic individuals 40.0 10.0 0.0 
B cells from healthy PBMCs 0.0 0.0 0.0 
Average 23.2 8.9 5.0 
Clustering Threshold for L Chain Junctions0.050.100.15
DatasetPercentage of H Chain–Based Clones Undergoing Further Clustering
CD19+ B cells 20.3 6.8 3.0 
B cells from NSCLC tumor 32.6 18.9 16.8 
B cells from food-allergic individuals 40.0 10.0 0.0 
B cells from healthy PBMCs 0.0 0.0 0.0 
Average 23.2 8.9 5.0 

The unit of measure for the clustering threshold for L chain junctions is normalized Hamming distance.

In this study, we investigated the accuracy of H chain–based clonal inference. With single-cell VH:VL paired BCR datasets, we performed B cell clonal inference using only the H chains, effectively treating the datasets as if bulk sequenced and unpaired. Over 80% of the inferred clones were accurate as defined by L chain consistency. Within the majority of these accurate clones, an additional round of clustering using the L chain sequence failed to yield finer resolution (<10% at a threshold of 0.1 normalized Hamming distance). Including a requirement that members of a clone share mutations between H chain sequences would have corrected 87% of the misclustered clones, although at the expense of also breaking up the accurately clustered part in 13% of these misclustered clones. Overall, whereas there remains additional information from the H chain that could be leveraged for improvement, we found H chain–based clustering alone capable of identifying clonal relationships with reasonable confidence.

Cells from four of the misclustered clones (13%) carried closely related H chains sharing a reasonable number of mutations in the V segment, yet they expressed L chains with different VJL combinations. There are several possibilities for how such clones could arise. During B cell maturation, the H chain rearranges first, and the cell proliferates before the L chain rearranges (1). Thus, it is possible that the misclustered clonal relationships we detect represent daughter cells of the same H chain VDJ-rearranged B cell that developed different L chains. Alternatively, a group of identical, autoreactive, immature B cells could have undergone receptor editing, during which independent rearrangements gave rise to different L chains paired with an identical H chain rearrangement. In these scenarios, the H chain–based clonal clustering accurately reflects the underlying biology. However, we believe that these scenarios are unlikely as the chance that cells with identical H chains but different L chains would developmentally share the same temporal and/or spatial trajectories, in addition to being sampled together for sequencing, is expected to be very small.

When examining L chain consistency of cells from an inferred clone, if more than one L chain sequence was associated with a cell, we checked every sequence present for a match against the clonal majority VJL combination. We did not, however, consider the possible complication in which there was partial sampling in such a cell. Hypothetically, should two clonally related B cells with dual L chains each have a different L chain sequenced, our analysis would have considered an H chain–based clone containing these cells misclustered. However, such B cells have been reported to be rare, especially outside autoimmunity, with dual-κ and dual-λ cells estimated to occupy ∼2–10% (24, 25) and 0.2% (26) of the normal murine repertoire, consistent with the percentages of cells with multiple L chains observed in the single-cell datasets in this study (dual: 1.00–7.06%; triple: 0.00–0.31%; quadruple: 0.00–0.02%; all cells in the dataset from (19) had exactly one L chain).

Our results suggest several ways that H chain–based clonal clustering could be improved. For example, H chain junctions were more similar to each other in accurate clones than in misclustered clones, suggesting that the clustering threshold was perhaps too lenient for the latter. We also observed that accurate clones shared more mutations in their V segment. Some likelihood-based methods, such as partis, implicitly take into consideration shared mutations via the use of a multi-Hidden Markov model that simultaneously emits multiple sequences (10). Although computationally slower compared with the distance-based methods explored in this study, future studies should explore the potential benefit of such methods that use both junction distance and shared mutation patterns.

A limitation of this study is that all but one of the datasets contained relatively small numbers of B cells, and none was sorted for specific subsets, such as memory B cells, that would enrich for expanded clones. As a result, only a small number of the inferred clones contained multiple cells and were thus suitable for analysis. As high-quality, single-cell B cell datasets of higher throughput with sorting for relevant B cell subsets emerge, similar analyses could be performed, leading to better estimates of performance. In addition, although L chains were insufficient to refine accurate H chain–based clones, they did identify ∼20% of the clones as misclustered and can be critical in other contexts, such as synthesizing functional Abs and evaluating Ag-binding specificity (27).

Clonal relationship inference is an early step crucial for computational BCR repertoire analyses. As studies taking advantage of the relatively low cost of bulk BCR sequencing continue to generate unpaired BCR data, current clonal clustering methods can determine most clonal relationships with reasonable confidence using H chains only, and their performance may continue to improve by considering additional characteristics such as the number of shared mutations in the H chains.

We thank Dr. Nima Nouri for critical reading of this manuscript.

This work was supported in part by the National Institutes of Health under Award R01AI104739.

The online version of this article contains supplemental material.

Abbreviations used in this article:

NSCLC

non–small cell lung carcinoma

SHM

somatic hypermutation

V

variable

VH

H chain V

VJL

V-J-junction-length

VL

L chain V.

1
Murphy
,
K.
,
C.
Weaver
.
2017
.
Janeway’s Immunobiology
, 9th Ed.
Garland Science
,
New York
.
2
Briney
,
B.
,
A.
Inderbitzin
,
C.
Joyce
,
D. R.
Burton
.
2019
.
Commonality despite exceptional diversity in the baseline human antibody repertoire.
Nature
566
:
393
397
.
3
Robins
,
H.
2013
.
Immunosequencing: applications of immune repertoire deep sequencing.
Curr. Opin. Immunol.
25
:
646
652
.
4
Greiff
,
V.
,
E.
Miho
,
U.
Menzel
,
S. T.
Reddy
.
2015
.
Bioinformatic and statistical analysis of adaptive immune repertoires.
Trends Immunol.
36
:
738
749
.
5
Gibson
,
K. L.
,
Y.-C.
Wu
,
Y.
Barnett
,
O.
Duggan
,
R.
Vaughan
,
E.
Kondeatis
,
B.-O.
Nilsson
,
A.
Wikby
,
D.
Kipling
,
D. K.
Dunn-Walters
.
2009
.
B-cell diversity decreases in old age and is correlated with poor health status.
Aging Cell
8
:
18
25
.
6
Hoehn
,
K. B.
,
G.
Lunter
,
O. G.
Pybus
.
2017
.
A phylogenetic codon substitution model for antibody lineages.
Genetics
206
:
417
427
.
7
Horns
,
F.
,
C.
Vollmers
,
D.
Croote
,
S. F.
Mackey
,
G. E.
Swan
,
C. L.
Dekker
,
M. M.
Davis
,
S. R.
Quake
.
2016
.
Lineage tracing of human B cells reveals the in vivo landscape of human antibody class switching. [Published erratum appears in 2016 Elife 5.]
Elife
5
:
e16578
.
8
Trück
,
J.
,
M. N.
Ramasamy
,
J. D.
Galson
,
R.
Rance
,
J.
Parkhill
,
G.
Lunter
,
A. J.
Pollard
,
D. F.
Kelly
.
2015
.
Identification of antigen-specific B cell receptor sequences using public repertoire analysis.
J. Immunol.
194
:
252
261
.
9
Dunn-Walters
,
D.
,
C.
Townsend
,
E.
Sinclair
,
A.
Stewart
.
2018
.
Immunoglobulin gene analysis as a tool for investigating human immune responses.
Immunol. Rev.
284
:
132
147
.
10
Ralph
,
D. K.
,
F. A.
Matsen
IV
.
2016
.
Likelihood-based inference of B cell clonal families.
PLOS Comput. Biol.
12
:
e1005086
.
11
Kepler
,
T. B.
,
S.
Munshaw
,
K.
Wiehe
,
R.
Zhang
,
J.-S.
Yu
,
C. W.
Woods
,
T. N.
Denny
,
G. D.
Tomaras
,
S. M.
Alam
,
M. A.
Moody
, et al
.
2014
.
Reconstructing a B-cell clonal lineage. II. Mutation, selection, and affinity maturation.
Front. Immunol.
5
:
170
.
12
Hershberg
,
U.
,
E. T.
Luning Prak
.
2015
.
The analysis of clonal expansions in normal and autoimmune B cell repertoires.
Philos. Trans. R. Soc. Lond. B Biol. Sci.
370
:
20140239
.
13
Gupta
,
N. T.
,
K. D.
Adams
,
A. W.
Briggs
,
S. C.
Timberlake
,
F.
Vigneault
,
S. H.
Kleinstein
.
2017
.
Hierarchical clustering can identify B cell clones with high confidence in Ig repertoire sequencing data.
J. Immunol.
198
:
2489
2499
.
14
Nouri
,
N.
,
S. H.
Kleinstein
.
2018
.
A spectral clustering-based method for identifying clones from high-throughput B cell repertoire sequencing data.
Bioinformatics
34
:
i341
i349
.
15
Nielsen
,
S. C. A.
,
S. D.
Boyd
.
2018
.
Human adaptive immune receptor repertoire analysis-past, present, and future.
Immunol. Rev.
284
:
9
23
.
16
Georgiou
,
G.
,
G. C.
Ippolito
,
J.
Beausang
,
C. E.
Busse
,
H.
Wardemann
,
S. R.
Quake
.
2014
.
The promise and challenge of high-throughput sequencing of the antibody repertoire.
Nat. Biotechnol.
32
:
158
168
.
17
DeKosky
,
B. J.
,
G. C.
Ippolito
,
R. P.
Deschner
,
J. J.
Lavinder
,
Y.
Wine
,
B. M.
Rawlings
,
N.
Varadarajan
,
C.
Giesecke
,
T.
Dörner
,
S. F.
Andrews
, et al
.
2013
.
High-throughput sequencing of the paired human immunoglobulin heavy and light chain repertoire.
Nat. Biotechnol.
31
:
166
169
.
18
Busse
,
C. E.
,
I.
Czogiel
,
P.
Braun
,
P. F.
Arndt
,
H.
Wardemann
.
2014
.
Single-cell based high-throughput sequencing of full-length immunoglobulin heavy and light chain genes.
Eur. J. Immunol.
44
:
597
603
.
19
Croote
,
D.
,
S.
Darmanis
,
K. C.
Nadeau
,
S. R.
Quake
.
2018
.
High-affinity allergen-specific human antibodies cloned from single IgE B cell transcriptomes.
Science
362
:
1306
1309
.
20
LeBien
,
T. W.
,
T. F.
Tedder
.
2008
.
B lymphocytes: how they develop and function.
Blood
112
:
1570
1580
.
21
Fink
,
K.
2012
.
Origin and function of circulating plasmablasts during acute viral infections.
Front. Immunol.
3
:
78
.
22
Gupta
,
N. T.
,
J. A.
Vander Heiden
,
M.
Uduman
,
D.
Gadala-Maria
,
G.
Yaari
,
S. H.
Kleinstein
.
2015
.
Change-O: a toolkit for analyzing large-scale B cell immunoglobulin repertoire sequencing data.
Bioinformatics
31
:
3356
3358
.
23
Lefranc
,
M.-P.
,
C.
Pommié
,
M.
Ruiz
,
V.
Giudicelli
,
E.
Foulquier
,
L.
Truong
,
V.
Thouvenin-Contet
,
G.
Lefranc
.
2003
.
IMGT unique numbering for immunoglobulin and T cell receptor variable domains and Ig superfamily V-like domains.
Dev. Comp. Immunol.
27
:
55
77
.
24
Casellas
,
R.
,
Q.
Zhang
,
N.-Y.
Zheng
,
M. D.
Mathias
,
K.
Smith
,
P. C.
Wilson
.
2007
.
Igkappa allelic inclusion is a consequence of receptor editing.
J. Exp. Med.
204
:
153
160
.
25
Velez
,
M.-G.
,
M.
Kane
,
S.
Liu
,
S. B.
Gauld
,
J. C.
Cambier
,
R. M.
Torres
,
R.
Pelanda
.
2007
.
Ig allotypic inclusion does not prevent B cell development or response.
J. Immunol.
179
:
1049
1057
.
26
Pelanda
,
R.
2014
.
Dual immunoglobulin light chain B cells: trojan horses of autoimmunity?
Curr. Opin. Immunol.
27
:
53
59
.
27
Robinson
,
W. H.
2015
.
Sequencing the functional antibody repertoire--diagnostic and therapeutic discovery.
Nat. Rev. Rheumatol.
11
:
171
182
.

The authors have no financial conflicts of interest.

Supplementary data