Analyses of somatic hypermutation (SHM) patterns in B cell Ig sequences have important basic science and clinical applications, but they are often confounded by the intrinsic biases of SHM targeting on specific DNA motifs (i.e., hot and cold spots). Modeling these biases has been hindered by the difficulty in identifying mutated Ig sequences in vivo in the absence of selection pressures, which skew the observed mutation patterns. To generate a large number of unselected mutations, we immunized B1-8 H chain transgenic mice with nitrophenyl to stimulate nitrophenyl-specific λ+ germinal center B cells and sequenced the unexpressed κ L chains using next-generation methods. Most of these κ sequences had out-of-frame junctions and were presumably uninfluenced by selection. Despite being nonfunctionally rearranged, they were targeted by SHM and displayed a higher mutation frequency than functional sequences. We used 39,173 mutations to construct a quantitative SHM targeting model. The model showed targeting biases that were consistent with classic hot and cold spots, yet revealed additional highly mutable motifs. We observed comparable targeting for functional and nonfunctional sequences, suggesting similar biological processes operate at both loci. However, we observed species- and chain-specific targeting patterns, demonstrating the need for multiple SHM targeting models. Interestingly, the targeting of C/G bases and the frequency of transition mutations at C/G bases was higher in mice compared with humans, suggesting lower levels of DNA repair activity in mice. Our models of SHM targeting provide insights into the SHM process and support future analyses of mutation patterns.

Somatic hypermutation (SHM) is a process that diversifies BCRs by introducing point mutations into Ig genes at a high rate (1). SHM is initiated when activation-induced cytidine deaminase (AID) is recruited to the Ig locus and converts cytosine (C) to uracil (U). Error-prone DNA repair pathways are then activated, resulting in somatic mutations either at the AID-targeted C/G base pair (phase I) or at neighboring base pairs (phase II) (2). Although stochastic, SHM is biased by the local DNA sequence context and preferentially introduces mutations at specific DNA motifs (hot spots) while avoiding others (cold spots) (35). SHM plays a crucial role in the B cell immune response and immune-mediated disorders. The analysis of mutation patterns and distributions has been widely used to infer selective processes involved in such responses (6). However, the analysis of SHM patterns can be confounded by the intrinsic biases of SHM targeting, driving the need for accurate characterization of targeting patterns that reflect inherent SHM properties in the absence of Ag-driven selection (7, 8).

The SHM process can be quantitatively characterized by a targeting model, consisting of a mutability model, which specifies the relative mutation frequency of DNA microsequence motifs, and a substitution model, which describes the specific nucleotide substitution frequencies at the mutated sites (913). These models can serve as a background distribution for statistical analysis of mutation patterns in Ig sequences, improving the ability to detect deviations in SHM pathways related to diseases or identify selected mutations that drive Ag specificity and affinity maturation (7, 8). However, modeling these intrinsic biases has been limited by the lack of large sets of Ig sequences that have undergone SHM in vivo in the absence of selection pressures. Previous work has focused on studying mutations in intronic regions or in Ig sequences that were determined to be nonfunctional (e.g., because of an out-of-frame junction) (911, 13). However, intronic regions have limited diversity and a different base composition from exonic V(D)J regions, and some mutations in nonfunctional sequences may be subject to selection pressures if the sequences were rendered nonfunctional during the affinity maturation process. Another strategy to determine targeting models involves using mutations that do not alter the amino acid sequence (i.e., silent or synonymous mutations), which presumably are not subject to selection pressures. We previously used this strategy to construct the human silent, 5-mer, functional (S5F) SHM targeting model from >800,000 mutations in functional Ig sequences (12). Despite the high resolution of this S5F model, the mutability of some DNA motifs could not be estimated directly because they do not yield silent mutations.

Modeling and analysis of SHM would also benefit from a clear understanding of whether equivalent models can be used across chains (L and H) and species (mouse and human). L and H chain genes are located on different chromosomes; thus, different regulatory elements and epigenetic effects may impact microsequence targeting specificity (1417). Previously, Shapiro et al. (9) reported comparable dinucleotide and trinucleotide mutabilities between L and H chains, and between mouse and human sequences. However, the small sequence database and short motif comparisons limited the resolution of this analysis.

In this study, we use a novel experimental strategy based on the (4-hydroxy-3-nitrophenyl)acetyl (NP) immunization system in mice, combined with next-generation sequencing technologies, to generate a large set of unselected mutations from nonfunctionally rearranged Ig κ-chain sequences. For comparison, we also sequenced L and H chains from healthy human subjects. Using a modified version of the 5-mer motif framework previously developed (12), we have used this novel database of mutations spanning the full V gene to develop a next-generation targeting model. We found that targeting patterns on nonfunctional sequences resemble those on functional sequences, but revealed unexpected chain- and species-specific differences in SHM targeting. These models provide insights into the SHM process and support future analyses of SHM patterns.

Four mIgM Tg JHD−/− BALB/c strain female mice (18), 6–10 wk old, were immunized with NP-CGG in alum adjuvant. Twenty-eight days after immunization the mice were sacrificed and splenocytes were harvested. Germinal center (GC) cells (live, B220+, NIP+, CD95+, CD38) were FACS-sorted, leaving 64,000, 87,000, 120,000, and 107,000 cells, respectively. As a control, non-GC B cells expressing κ L chain with a B220+, CD95, CD38+, λ phenotype were FACS-sorted from two nonimmunized mice, leaving 500,000 and 400,000 cells, respectively. RNA was isolated from sorted cells using the RNeasy Mini kit (Qiagen) per the manufacturer’s protocol and stored at −80°C.

Four healthy donors were recruited for the assessment of B cell Ab repertoire analysis. Specimens were collected after informed written consent was obtained, under a protocol approved by the Human Research Protection Program at Yale School of Medicine. PBMCs were isolated from whole blood by Ficoll-Paque gradient centrifugation. B cells were first enriched with CD20 MicroBeads (Miltenyi Biotec); then memory (CD27+) B cells were isolated through FACS on a FACSAria flow cytometer (BD) into bulk collections before RNA isolation and sequencing. For cytometric analysis and sorting, enriched B cells were stained with PerCP-Cy5.5 anti-human CD27 (clone O323), Pacific Blue anti-human CD19 (clone HIB 19) (both from BioLegend), and with FITC anti-human IgM (clone G20-127; BD). For this study, memory and bulk unsorted B cell populations for subjects HD09, HD10, and HD13 were analyzed; only bulk unsorted B cells were analyzed for subject HD07.

RNA from sorted memory B cells was extracted via Qiagen RNeasy kit as per manufacturer’s recommendation. A total of 250 ng of RNA input was used for each sample. Immune sequencing library construction for human samples was performed as previously described using human V region H, V region L-λ and V region L-κ C region primer mix in each reaction (19, 20), and for mouse samples was performed using mouse V region L-λ and V region L-κ primer mix (21). In brief, RNA was reverse transcribed using biotin-dT oligonucleotide and barcoded on the 5′ end in a template switchlike reaction using an oligonucleotide harboring a semidegenerate 17-nt-long unique identifier (UID) barcode and a universal flanking sequence. The cDNA was purified using streptavidin bead pulldown and then subjected to a first round of 12 cycles of PCR. The human samples used a human C region primer mix composed of Ig H chain C region, Ig κ L chain C region, and Ig λ L chain C region flanked by another universal sequence (19, 20), and the mouse samples used a mouse primer mix composed of Ig κ L chain C region and Ig λ L chain C region (21). A second round of 12 cycles of PCR was then performed using universal overhang from both ends to add the required Illumina sequencing and clustering adapters. The samples were then pooled at equimolar ratio and subjected to sequencing on the Illumina MiSeq platform at 325 × 275 bp. Raw mouse and human sequences are available via the Sequence Read Archive under BioProject accession numbers PRJNA283640 and PRJNA338795, respectively. Human samples included in this study were restricted to BioSample accession numbers SAMN05570585, SAMN05570589, SAMN05570593, SAMN05570596, SAMN05570598, SAMN05570602, SAMN05570604.

Raw sequences were processed using the Repertoire Sequencing Toolkit (pRESTO) (22) as previously described (12). The script containing detailed procedures for Ig sequence preprocessing is included in the supplemental material. In summary, we removed low-quality sequences, annotated sequences by UID (corresponding to the same mRNA molecule) and isotype-specific primer, generated a consensus sequence for each UID, and assembled pair-end reads. To obtain high-fidelity sequences, we constrained the reads being used for mutation analysis to appear at least twice. Furthermore, a sliding window approach was used to eliminate sequences with ≥6 mutations in 10 consecutive nucleotides. IMGT was used to assign germline V(D)J segments and determine functionality (23). Mutations were defined as nucleotides that were different from the inferred germline sequence. Clustering of sequences into clonal groups was carried out using the Change-O command line tool (24). H chain sequences were assigned into clonal groups on the basis of identical V gene, J gene, and junction length, with a weighted intraclonal distance threshold of 10 using the substitution probabilities previously described (25). L chain sequences were assigned into clonal groups on the basis of identical V gene, J gene, and junction sequence.

The empirical sequencing error rate for each sequence was computed by comparing the number of mismatched nucleotides with the consensus sequence in each UID group. The UID consensus sequence was obtained by majority rule at each nucleotide base in each UID group. The error rate of UID sequences was estimated by subtracting the probability that each base is correct (defined by more than half of the sequences at each position being correct) from 1. The actual error is expected to be lower because sequencing may result in distinct errors, in which case less than half of correct bases at a position would result in the correct UID consensus.

The construction of silent (S) 5-mer models was previously described (12). The replacement and silent (RS) model was constructed using the same method except that both silent and replacement mutations were included.

The SHazaM R package (version 0.1.1) was used to compute the targeting models (12, 24). For substitution models, the values of 5-mers with <20 mutations were inferred. Inference was performed by first averaging substitution frequencies of all mutations in 5-mers matching the same inner 3-mers. If the number of mutations was still insufficient (<20), substitution frequencies of all mutations in 5-mers matching the same middle base were averaged. In the mutability models, 5-mers with <500 mutations in sequences containing the mutated 5-mer were inferred. If the mutated base was A or T, inference was done by averaging the mutability of all 5-mers sharing the same inner 3-mer. For the base C, inference was based on 5-mers sharing the same upstream 3-mer (the mutated base and two nucleotides upstream of it), whereas for the base G, inference was based on 5-mers sharing the same downstream 3-mer (the mutated base and two nucleotides downstream of it).

Sequencing-error models were constructed for the purposes of comparing with targeting models. Combined PCR and MiSeq sequencing error (hereafter referred to as sequencing error) profiles were generated using sequences from large UID groups (>20 sequences) in mice. A consensus sequence was constructed per group through majority rule at each nucleotide position. Within each UID group, the deviations from consensus sequences were counted as sequencing errors and used to construct the error models. Targeting models for these errors were built the same way as the 5-mer RS targeting model, except that all sequences were treated independently (instead of collapsing mutations for each clone). 5-mer motifs with <20 mutations in the center base or <500 mutations in any base in sequences containing the mutated motif were excluded. Samples that did not have any 5-mers matching the criteria were excluded from the study.

For each pair of samples, the distance was defined as follows: 1 − Spearman correlation coefficient between any two targeting models, considering only the motifs observed in both models. The R function prcomp was used to project the distance matrix onto the first two principal components.

Ten simulated repertoires were generated for each SHM model (mouse L chain, human L chain, and human H chain). For each repertoire, 2000 sequences were simulated with five mutations per sequence, roughly the same mutation levels as observed in the GC cells from mice analyzed in this study. The mutations were added to the sequences stochastically using a mutation algorithm previously described (8). All the simulations were initiated with the germline Mus mus IGKV1-135*01, the most abundant V segment among nonfunctional chains in the NP-immunized mice. The simulation results were compared across different SHM targeting models by calculating the number of different mutations (defined as mutations at different positions or mutations to different bases) between repertoires divided by the total number of mutations.

To generate a large number of unselected mutations necessary to precisely model SHM targeting, we carried out next-generation repertoire sequencing (Rep-Seq) (26) in NP-immunized mIgM Tg mice (18). These mice express a transgenic H chain (B1-8) that binds the NP Ag when paired with λ, but not κ, L chains. We reasoned that NP-binding B cells in these mice would likely carry nonfunctionally rearranged κ-chains, because λ-chains are generally used only after failure to generate a productive κ-chain rearrangement during B cell development (27). Next-generation sequencing of κ L chains of sorted GC NP-binding B cells from four mice 28 d postimmunization with NP produced 25,341 distinct κ L chain sequences (Table I). Consistent with our expectations, 20,194 sequences (80%) were identified as nonfunctional because they contained an out-of-frame junction (92%) or a stop codon (8%) (Supplemental Fig. 1). The small fraction of in-frame κ L chain sequences may be derived from B cells reacting to the NP carrier protein (CGG) that contaminated the sort, or may represent cells carrying both a productive κ- and λ-chain (28, 29).

Table I.
Mouse sequencing data
No. of Unique Sequences
No. of Clones
No. of Mutations
SampleChainProcessedFunctionalTotalFunctionalNonfunctionalaFunctionalNonfunctionalaFunctional SilentNonfunctional Silenta
NP1 κ 2,526 534 1,163 232 942 739 4,225 183 1,081 
 λ 25,755 21,332 1,298 516 794 2,728 3,691 718 703 
NP2 κ 8,703 1,555 2,671 464 2,233 1,546 12,489 370 3,347 
 λ 84,141 70,990 2,552 930 1,653 4,923 7,874 1,440 1,566 
NP3 κ 10,006 2,226 3,245 656 2,626 2,305 14,992 565 3,758 
 λ 89,976 76,406 2,720 1,045 1,711 5,528 8,508 1,590 1,767 
NP4 κ 4,106 832 1,813 373 1,459 1,095 7,467 252 1,964 
 λ 50,587 42,523 1,923 658 1,281 3,639 6,214 1,017 1,230 
Total κ 25,341 5,147 8,892 1,725 7,260 5,685 39,173 1,370 10,150 
 λ 250,459 211,251 8,493 3,149 5,439 16,818 26,287 4,765 5,266 
           
C1 κ 89,707 76,245 15,688 9,905 5,989 18,984 7,843 3,871 879 
 λ 3,844 590 798 138 662 202 1,293 32 178 
C2 κ 26,041 22,814 5,797 3,853 2,026 6,900 2,539 1,331 266 
 λ 1,240 183 305 44 262 62 493 65 
Total κ 115,748 99,059 21,485 13,758 8,015 25,884 10,382 5,202 1,145 
 λ 5,084 773 1,103 182 924 264 1,786 41 243 
No. of Unique Sequences
No. of Clones
No. of Mutations
SampleChainProcessedFunctionalTotalFunctionalNonfunctionalaFunctionalNonfunctionalaFunctional SilentNonfunctional Silenta
NP1 κ 2,526 534 1,163 232 942 739 4,225 183 1,081 
 λ 25,755 21,332 1,298 516 794 2,728 3,691 718 703 
NP2 κ 8,703 1,555 2,671 464 2,233 1,546 12,489 370 3,347 
 λ 84,141 70,990 2,552 930 1,653 4,923 7,874 1,440 1,566 
NP3 κ 10,006 2,226 3,245 656 2,626 2,305 14,992 565 3,758 
 λ 89,976 76,406 2,720 1,045 1,711 5,528 8,508 1,590 1,767 
NP4 κ 4,106 832 1,813 373 1,459 1,095 7,467 252 1,964 
 λ 50,587 42,523 1,923 658 1,281 3,639 6,214 1,017 1,230 
Total κ 25,341 5,147 8,892 1,725 7,260 5,685 39,173 1,370 10,150 
 λ 250,459 211,251 8,493 3,149 5,439 16,818 26,287 4,765 5,266 
           
C1 κ 89,707 76,245 15,688 9,905 5,989 18,984 7,843 3,871 879 
 λ 3,844 590 798 138 662 202 1,293 32 178 
C2 κ 26,041 22,814 5,797 3,853 2,026 6,900 2,539 1,331 266 
 λ 1,240 183 305 44 262 62 493 65 
Total κ 115,748 99,059 21,485 13,758 8,015 25,884 10,382 5,202 1,145 
 λ 5,084 773 1,103 182 924 264 1,786 41 243 
a

The number of nonfunctional sequences is the number of processed sequences minus the number of functional sequences.

Previous data suggested that the nonfunctional κ sequences would accumulate SHMs because they were derived from GC B cells, where the SHM machinery is active, and where Ig is clearly being transcribed (because the sequencing was carried out from an mRNA template) (9, 13, 30). To determine the extent of SHM, we used IMGT/HighV-QUEST to identify and align the germline V and J gene segments for each of the observed sequences. We found that the nonfunctional κ L chain sequences in NP-binding GC B cells had an average mutation frequency of 1.49% in the V region. Interestingly, the nonfunctional κ sequences were consistently more mutated than functional κ sequences, which had a mutation frequency of 0.95% (p < 1e−15) (Fig. 1). The decreased mutation frequency observed in functional sequences was not due to the use of V segments with fewer hot-spot motifs, because a similar pattern was observed when comparing mutation frequencies in functional and nonfunctional sequences using the same V segment. The λ L chains sequenced from the same B cell subsets had a similar mutation frequency as the functional κ-chain sequences. In contrast, κ-chain sequences from non-GC cells sorted from two unimmunized mice were significantly less mutated than κ-chain sequences from GC cells (p < 1e−15). The mutation frequencies in all of these populations were higher than the estimated sequencing error rate after correction using the molecular barcodes (estimated sequencing error rate <1e−9). To identify independent SHM events, we partitioned the sequences into groups that were likely to be clonally related (see 2Materials and Methods), and identified the set of distinct mutations from each clone. In total, we obtained 39,173 independent and unselected mutations from NP-binding nonfunctional κ-chain sequences.

FIGURE 1.

Nonfunctional sequences accumulate a high frequency of somatic mutations. κ L chains were sequenced from four NP-immunized (NP1-4) or two control mice (C1-2) (Table I). The mutation frequency of the V segment in functional (shaded boxes) and nonfunctional (open boxes) sequences was calculated by comparing each sequence with its germline segment determined by IMGT/HighV-Quest. Boxes indicate the mean mutation frequency and 5–95% quantiles. Mean mutation frequencies for comparable λ L chains are indicated by X.

FIGURE 1.

Nonfunctional sequences accumulate a high frequency of somatic mutations. κ L chains were sequenced from four NP-immunized (NP1-4) or two control mice (C1-2) (Table I). The mutation frequency of the V segment in functional (shaded boxes) and nonfunctional (open boxes) sequences was calculated by comparing each sequence with its germline segment determined by IMGT/HighV-Quest. Boxes indicate the mean mutation frequency and 5–95% quantiles. Mean mutation frequencies for comparable λ L chains are indicated by X.

Close modal

Similar to previous models (12), we defined the mutability of a DNA motif as the probability of the central base in the motif being targeted by SHM relative to all other motifs (described in 2Materials and Methods). To capture the classic SHM hot spots (WRC/GYW and WA/TW, where W = [A, T], R = [G, A], Y = [C, T]) (2) on both the forward and reverse strands, we modeled targeting patterns based on 5-mer motifs. Furthermore, we used both replacement (R) and silent (S) mutations because neither type of mutation should be subject to selection pressures in nonfunctional κ-chain sequences in λ+ cells. Using the notation developed in our previous work, we refer to this model as RS, 5-mer, nonfunctional (RS5NF) (Fig. 2). The RS5NF targeting model contained 825 directly observed motifs (Fig. 2A) and 199 unobserved motifs, whose values were inferred from similar motifs (Fig. 2B). We built a separate targeting model using data from each mouse and confirmed that the mutabilities were highly similar among them (Supplemental Fig. 2A). As expected, classic hot spots generally had higher mutabilities, whereas classic cold spots had lower mutabilities. Furthermore, we observed potential new strand-symmetric hot-spot motifs, CRCY/RGYG (Fig. 2, orange bars) and ATCT/AGAT (Fig. 2, magenta bars), whose mean mutability levels were more than twice as large as the mean mutability level of the classic WA/TW hot spot. These SHM targeting patterns remained stable over the course of B cell maturation because models constructed from sequences with <1% mutation frequency were well correlated with those constructed from sequences with ≥1% mutation frequency; the strength of the correlations between highly and lowly mutated sequences (average pairwise Pearson ρ = 0.61) was comparable with pairwise correlations for sequences with <1% mutation frequency (average Pearson ρ = 0.58).

FIGURE 2.

SHM targeting model for murine L chains. An RS5NF model was constructed to estimate the mutability of 5-mer motifs using both R and S mutations in nonfunctional κ sequences from NP-binding cells in four immunized mice (NP1-4 in Table I). Hedgehog plots depict the relative targeting of 5-mer motifs centered on the bases A, T, C, and G, with each center base in an individual circle. (A) The SHM targeting model in which the mutability values were measured directly from Ig sequencing data. (B) The complete SHM targeting model with inferred values for 5-mer motifs that could not be directly estimated. Classic hot spots are colored in red (WRC/GYW) and green (WA/TW), whereas cold spots are colored in blue (SYC/GRS). Potential novel hot spots are colored in orange (CRCY/RGYG, R = [G,A], Y = [C,T]) and magenta (ATCT/AGAT). All other motifs are shown in grey.

FIGURE 2.

SHM targeting model for murine L chains. An RS5NF model was constructed to estimate the mutability of 5-mer motifs using both R and S mutations in nonfunctional κ sequences from NP-binding cells in four immunized mice (NP1-4 in Table I). Hedgehog plots depict the relative targeting of 5-mer motifs centered on the bases A, T, C, and G, with each center base in an individual circle. (A) The SHM targeting model in which the mutability values were measured directly from Ig sequencing data. (B) The complete SHM targeting model with inferred values for 5-mer motifs that could not be directly estimated. Classic hot spots are colored in red (WRC/GYW) and green (WA/TW), whereas cold spots are colored in blue (SYC/GRS). Potential novel hot spots are colored in orange (CRCY/RGYG, R = [G,A], Y = [C,T]) and magenta (ATCT/AGAT). All other motifs are shown in grey.

Close modal

To test whether SHM targeting at the nonfunctional locus was similar to the functional locus, we compared targeting models constructed from functional and nonfunctional sequences. Because selection pressures may influence mutation patterns in functional sequences, we constructed a model based only on silent mutations. For this comparison, the model for nonfunctional sequences was also based on silent mutations to ensure comparability. Specifically, we built a silent, 5-mer, nonfunctional (S5NF) model using silent mutations from nonfunctional κ sequences from GC cells in immunized mice, and an S5F model using silent mutations from functional κ sequences from both GC B cells in immunized mice and non-GC B cells in control mice. The mutability models for nonfunctional (S5NF) and functional (S5F) sequences were highly correlated (Pearson ρ = 0.86 and Spearman ρ = 0.58 over 250 common 5-mer motifs), consistent with the notion that the same mutational pathways are operating at both loci in this murine system (Fig. 3A).

FIGURE 3.

SHM targeting is conserved at the nonfunctional locus. Functional or nonfunctional sequences were analyzed separately to estimate the mutability of each 5-mer (points) using data from (A) mouse κ L chains from six mice (NP1-4 and C1-2 in Table I) (or four NP-immunized mice only for nonfunctional chains), (B) human L chains from four subjects (HD samples in Table II), or (C) human H chains from four subjects (HD samples in Table II). Classic hot and cold spots are colored as specified in Fig. 2. The mutabilities were normalized such that the average mutability of the motifs observed in both loci was 1.

FIGURE 3.

SHM targeting is conserved at the nonfunctional locus. Functional or nonfunctional sequences were analyzed separately to estimate the mutability of each 5-mer (points) using data from (A) mouse κ L chains from six mice (NP1-4 and C1-2 in Table I) (or four NP-immunized mice only for nonfunctional chains), (B) human L chains from four subjects (HD samples in Table II), or (C) human H chains from four subjects (HD samples in Table II). Classic hot and cold spots are colored as specified in Fig. 2. The mutabilities were normalized such that the average mutability of the motifs observed in both loci was 1.

Close modal
Table II.
Human sequencing data
No. of Unique Sequences
No. of Clones
No. of Mutations
SampleChainProcessedFunctionalNonfunctionalaTotalFunctionalNonfunctionalaFunctionalNonfunctionalaFunctional SilentNonfunctional Silenta
HD07 5,697 5,570 58 2,870 2,789 57 24,032 616 8,650 189 
 21,011 20,362 454 6,455 5,944 418 41,135 2,077 14,280 479 
HD09 8,628 8,362 199 6,426 6,182 190 48,479 1,632 16,831 469 
 36,596 34,846 1,514 16,319 14,909 1,266 113,235 7,909 40,399 1,837 
HD10 8,742 8,462 233 7,333 7,064 217 49,327 1,627 16,908 354 
 26,089 24,553 1,338 14,611 13,322 1,144 89,687 6,243 31,415 1,299 
HD13 9,193 8,900 218 5,903 5,647 200 56,551 2,157 20,200 552 
 27,929 26,505 1,191 12,508 11,490 901 87,735 6,532 31,694 1,484 
Total 32,260 32,294 708 22,532 21,682 664 178,389 6,032 62,589 1,564 
 111,625 106,266 4,497 49,893 45,665 3,729 331,792 22,761 117,788 5,099 
No. of Unique Sequences
No. of Clones
No. of Mutations
SampleChainProcessedFunctionalNonfunctionalaTotalFunctionalNonfunctionalaFunctionalNonfunctionalaFunctional SilentNonfunctional Silenta
HD07 5,697 5,570 58 2,870 2,789 57 24,032 616 8,650 189 
 21,011 20,362 454 6,455 5,944 418 41,135 2,077 14,280 479 
HD09 8,628 8,362 199 6,426 6,182 190 48,479 1,632 16,831 469 
 36,596 34,846 1,514 16,319 14,909 1,266 113,235 7,909 40,399 1,837 
HD10 8,742 8,462 233 7,333 7,064 217 49,327 1,627 16,908 354 
 26,089 24,553 1,338 14,611 13,322 1,144 89,687 6,243 31,415 1,299 
HD13 9,193 8,900 218 5,903 5,647 200 56,551 2,157 20,200 552 
 27,929 26,505 1,191 12,508 11,490 901 87,735 6,532 31,694 1,484 
Total 32,260 32,294 708 22,532 21,682 664 178,389 6,032 62,589 1,564 
 111,625 106,266 4,497 49,893 45,665 3,729 331,792 22,761 117,788 5,099 
a

Nonfunctional sequences are defined as sequences with out-of-frame junctions.

To test whether SHM targeting was also consistent across functional and nonfunctional loci in humans, we recruited four healthy human subjects and sequenced both their Ig H and L chains. Overall, we collected 32,260 unique H chain sequences and 111,625 unique L chain sequences (Table II). Comparison of S5F targeting models constructed independently for each subject showed that targeting patterns were highly conserved across individuals (Supplemental Fig. 2B, 2C), consistent with previous observations (12). We thus agglomerated mutations from all individuals to build L chain S5F and S5NF targeting models, using functional and nonfunctional (defined by out-of-frame junctions) sequences, respectively. The relative mutability of DNA motifs in the nonfunctional and functional sequences was highly correlated (Pearson ρ = 0.93, Spearman ρ = 0.88; Fig. 3B). Similar results were obtained for the human H chain sequences (Pearson ρ = 0.93, Spearman ρ = 0.86; Fig. 3C). Overall, these results suggest the SHM targeting mechanism is highly conserved at the functional and nonfunctional loci in both the human and the mouse systems.

To understand whether SHM targeting was species specific, we compared the mouse and human L chain mutability models. Overall, there was a high correlation between the mouse S5NF model and the human S5F model (Pearson ρ = 0.63, Spearman ρ = 0.63; Fig. 4A). In both species, classic hot spots generally exhibited higher mutabilities, whereas classic cold spots exhibited lower mutabilities. Despite this similarity, we observed a global shift in mutabilities at certain bases. In the mouse model, 5-mer motifs with C or G (hereafter written as C/G) as the central base had much higher mutabilities compared with the same motifs in the human model across the 299 motifs observed in both models, whereas 5-mer motifs centered at A or T (hereafter written as A/T) had much lower mutabilities (Fig. 4B). When considering the mutability of motifs centered on each base separately, the mutabilities were highly consistent (Supplemental Fig. 3), suggesting that the difference between mouse and human SHM targeting exists mainly in the overall targeting frequencies of C/G verses A/T bases. The models adjust for differences in CG contents between mice and humans by normalizing mutabilities using frequencies of microsequence motifs in the germline sequences. To ensure that the observed species-specific difference was not due to tissue-specific differences (mouse B cells were collected from the spleen, whereas human B cells were from PBMCs), we examined an independent data set derived from lymph node B cells from four multiple sclerosis subjects (20). The largest mutability difference between species (i.e., mouse versus human) was five times greater than the largest difference between tissues (i.e., blood versus lymph node) in the same species (data not shown), suggesting that the large increase observed in C/G targeting in mice compared with humans was likely due to a species-dependent mechanistic difference in SHM.

FIGURE 4.

The mutability of C/G (relative to A/T) is increased in mice compared with humans. SHM targeting models were estimated using silent mutations from nonfunctional mouse κ L chains in four NP-immunized mice (NP1-4 in Table I) or functional human L chains from four subjects (HD samples in Table II). (A) The mutability of each 5-mer (points) was compared between the mouse and human models. (B) The distribution of mutabilities for all 5-mer motifs centered at each base (A, T, G, and C) was compared between the mouse (shaded boxes) and human (open boxes) models. The mutabilities were normalized such that the average mutability of the motifs observed in both species was 1.

FIGURE 4.

The mutability of C/G (relative to A/T) is increased in mice compared with humans. SHM targeting models were estimated using silent mutations from nonfunctional mouse κ L chains in four NP-immunized mice (NP1-4 in Table I) or functional human L chains from four subjects (HD samples in Table II). (A) The mutability of each 5-mer (points) was compared between the mouse and human models. (B) The distribution of mutabilities for all 5-mer motifs centered at each base (A, T, G, and C) was compared between the mouse (shaded boxes) and human (open boxes) models. The mutabilities were normalized such that the average mutability of the motifs observed in both species was 1.

Close modal

The increase in C/G (verses A/T) targeting in the mouse model suggested a relative decrease in the recruitment of the MSH2/MSH6 DNA mismatch repair pathway that drives mutation spreading from the original AID-induced lesion, thus leading to a shift in the balance toward phase I (versus phase II) SHM. To test whether there was also a decreased efficiency of recruiting uracil-DNA glycosylase (UNG), the base excision repair enzyme associated with phase Ib of SHM, we analyzed the pattern of nucleotide substitution frequencies computed for each base by aggregating all mutations at positions where only silent mutations were possible (Table III). We reasoned that decreased recruitment of UNG would lead to an increase in AID-induced lesions being resolved through simple replication events, and consequently an increase in the frequency of transition mutations (versus transversion mutations) at C/G bases. Consistent with this hypothesis, and with previously reported transition frequencies in mice (4, 31, 32) and humans (30), the transition frequencies from C-to-T and G-to-A in mice were significantly higher than those in humans (Table III) (p < 1e−15, Pearson’s χ2 test). This species-specific difference in transition frequency was found only at C/G bases, whereas the substitution profiles at A/T bases were similar between mice and humans. This is consistent with a shift in the balance toward phase Ia compared with phase Ib SHM in mice. Overall, these observations suggest a general decrease in the efficiency of recruiting DNA error-prone repair pathways normally associated with SHM to the sites of AID-induced lesions in mice compared with humans.

Table III.
SHM substitution matrices
Mouse L Chain
Human L Chain
Human H Chain
From\ToACGTACGTACGT
— 0.16 0.54 0.29 — 0.26 0.50 0.24 — 0.28 0.49 0.23 
0.13 — 0.09 0.78 0.25 — 0.30 0.45 0.21 — 0.35 0.44 
0.77 0.14 — 0.09 0.53 0.31 — 0.16 0.49 0.35 — 0.16 
0.25 0.57 0.18 — 0.20 0.57 0.23 — 0.19 0.44 0.37 — 
Mouse L Chain
Human L Chain
Human H Chain
From\ToACGTACGTACGT
— 0.16 0.54 0.29 — 0.26 0.50 0.24 — 0.28 0.49 0.23 
0.13 — 0.09 0.78 0.25 — 0.30 0.45 0.21 — 0.35 0.44 
0.77 0.14 — 0.09 0.53 0.31 — 0.16 0.49 0.35 — 0.16 
0.25 0.57 0.18 — 0.20 0.57 0.23 — 0.19 0.44 0.37 — 

The substitution frequency from each base to every other base is computed using the set of mutations observed at positions where all base changes are silent.

Because of different chromosomal contexts, it is possible that the SHM process for H and L chains is subject to distinct regulatory machineries. To examine this, we compared the human L chain S5F model with the human H chain S5F model previously built. The L chain and the H chain models were well correlated (Pearson ρ = 0.66, Spearman ρ = 0.75; Fig. 5A), and the classic hot and cold spots showed the expected deviations from neutral motifs. However, the cross-chain correlations were significantly weaker than the correlations for same-chain comparisons across individuals (p < 1e−15; Fig. 5B). Furthermore, >30% of the 5-mer motifs had mutabilities that were >2-fold different between H and L chains (100/323 motifs observed in both models), and >7% of the motifs were at least 4-fold different (25/323 motifs), whereas <17% of the motifs were >2-fold different between the same type of chains (when comparing separate models built for each individual). The targeting model built solely on human κ L chains (instead of combining κ and λ) did not yield higher correlation with the mouse κ-chain model, possibly because a smaller database is more prone to noise. Overall, these results indicate that H and L chains exhibit similar SHM targeting patterns, but distinct mechanisms are likely acting at each locus, and analysis of H and L chain SHM patterns is likely to require separate targeting models.

FIGURE 5.

SHM targeting in human H and L chains is distinct. SHM targeting models were estimated using silent mutations in functional sequences from four subjects (HD samples in Table II). (A) The mutability of each 5-mer (points) was compared between L and H chains for the models based on all individuals (HD samples) combined. (B) Separate SHM targeting models were constructed for H chains and L chains from each individual, and the Spearman correlation coefficients of the 5-mer mutabilities between each pair of individuals was determined. Comparisons between the same types of chains (H versus H or L versus L) are represented by empty bars, whereas comparisons between H and L chains are represented by filled bars. The mutabilities were normalized such that the average mutability of the motifs observed in both chains was 1.

FIGURE 5.

SHM targeting in human H and L chains is distinct. SHM targeting models were estimated using silent mutations in functional sequences from four subjects (HD samples in Table II). (A) The mutability of each 5-mer (points) was compared between L and H chains for the models based on all individuals (HD samples) combined. (B) Separate SHM targeting models were constructed for H chains and L chains from each individual, and the Spearman correlation coefficients of the 5-mer mutabilities between each pair of individuals was determined. Comparisons between the same types of chains (H versus H or L versus L) are represented by empty bars, whereas comparisons between H and L chains are represented by filled bars. The mutabilities were normalized such that the average mutability of the motifs observed in both chains was 1.

Close modal

To explore the global relationships between the species- and chain-specific SHM targeting models, we built separate targeting models for each individual mouse and human of the following categories: 1) κ L chain sequences from four NP-immunized mice (RS5NF models), 2) κ L chain sequences from two control mice (S5F models), 3) κ and λ L chain sequences from four healthy human subjects (S5F models), 4) H chain sequences from four healthy human subjects (S5F models), 5) H chain sequences from 11 human subjects used in a previous study by Yaari et al. (12) (S5F models), and 6) sequencing errors from three mouse samples (described in 2Materials and Methods). The pairwise distance between each of these models was defined as follows: 1 − Spearman correlation of the mutability estimates. Principal component analysis was then used to project this distance matrix onto the first two principal components, which together account for most of the variance in the distance matrix (95%) (see 2Materials and Methods) (Fig. 6A). We found that mouse L chains, human L chains, and human H chains clustered into distinct groups. The high consistency of samples within each group, combined with the disparity across the groups, indicates that different species and chains have distinct targeting patterns. These differences were not completely due to the increased targeting of C/G bases in mice, because renormalizing the models to have the same mutability for all 5-mers with the same central base produced a similar result. The differences observed across chains and between species were also not likely due to usage of distinct germline sequences because mutabilities are normalized based on the frequency of occurrence of each motif in the germline sequences. To confirm this, the analyses were repeated using separate models constructed for each V gene family independently. These results showed that the variation between V gene families at the same locus was small compared with the differences observed across chains and between species. All of the models were clustered far from the sequencing error model, providing reassurance that the SHM targeting models were not significantly influenced by potential sequencing errors. Interestingly, the human L chain models were intermediate between the murine L chain and the human H chain models, suggesting some conservation of SHM mechanisms at this locus. To better understand how these differences in SHM targeting could impact affinity maturation, we designed a simulation-based analysis to quantify how often the models lead to different mutations being introduced in a repertoire of 2000 Mus mus IGKV1-135*01 sequences (the most frequent segment among mouse nonfunctional sequences), each carrying five mutations. Ten simulated repertoires were generated for each SHM model (mouse L chain, human L chain, and human H chain) using a mutation algorithm previously described (8). We found that two repertoires that responded independently under the same mutability model (mouse L chain) would differ in ∼13% of mutations. In contrast, responses that evolved under the mouse L chain model differed from those using the human H chain model by >38% (Fig. 6B). In contrast with the PCA in Fig. 6A, the human L chain model was not closer to the mouse L chain than the human H chain. This is because the V segment chosen for the simulation lacks the sequence motifs exhibiting similar mutabilities between the mouse L chain model and the human L chain model. However, in simulations with other V segment sequences, we observed that the human L chain model rendered more similar mutations with the mouse L chain model. Overall, these analyses provide strong evidence for species- and chain-specific targeting in the SHM process, and highlight the need to use the proper targeting models for the analysis of SHM patterns.

FIGURE 6.

SHM targeting displays species- and chain-specific features. (A) The distance between each pair of targeting models was defined as follows: 1 − Spearman correlation measure between the mutabilities of the 5-mer motifs observed in both models. PCA was used to project this distance matrix onto two components that explain most of the variance. Analyzed models include S5F model from human H chains in 15 subjects (HD samples in Table II and previous data set [12]; open circles), S5F model from human L chains in four subjects (HD samples in Table II; squares), S5F model from murine L chains in two mice (C1-2 in Table I; open triangles), RS5NF model from murine L chains in four mice (NP1-4 in Table I; filled triangles), and sequencing errors determined from large UID groups (×). (B) A simulation-based analysis was used to quantify how often the SHM targeting models lead to different mutations being introduced in a repertoire of 2000 Mus mus IGKV1-135*01 sequences (the most frequent segment among nonfunctional mouse sequences), each carrying five mutations (see 2Materials and Methods for details).

FIGURE 6.

SHM targeting displays species- and chain-specific features. (A) The distance between each pair of targeting models was defined as follows: 1 − Spearman correlation measure between the mutabilities of the 5-mer motifs observed in both models. PCA was used to project this distance matrix onto two components that explain most of the variance. Analyzed models include S5F model from human H chains in 15 subjects (HD samples in Table II and previous data set [12]; open circles), S5F model from human L chains in four subjects (HD samples in Table II; squares), S5F model from murine L chains in two mice (C1-2 in Table I; open triangles), RS5NF model from murine L chains in four mice (NP1-4 in Table I; filled triangles), and sequencing errors determined from large UID groups (×). (B) A simulation-based analysis was used to quantify how often the SHM targeting models lead to different mutations being introduced in a repertoire of 2000 Mus mus IGKV1-135*01 sequences (the most frequent segment among nonfunctional mouse sequences), each carrying five mutations (see 2Materials and Methods for details).

Close modal

Models of SHM microsequence targeting and substitution bias can provide mechanistic insights into the underlying mutation and DNA repair pathways, and serve as critical background models for statistical analyses of mutation patterns. The development of SHM targeting models has been hindered by the difficulty in identifying large numbers of somatic mutations that have not been influenced by selection. To overcome this limitation, we used high-throughput sequencing to generate a large number of κ L chains from NP-binding GC B cells. Because B cells specific for the NP hapten use λ L chains, many of these cells also carry a nonfunctional κ L chain sequence. These nonfunctional κ L chains accumulated significant numbers of somatic mutations after NP immunization. To model SHM targeting, we adapted the 5-mer microsequence targeting framework (12), which estimates the mutability of each base as a function of the surrounding two bases upstream and two bases downstream. This approach captured SHM targeting patterns that were highly reproducible across individual mice. The patterns were also similar in sequences with low (<1%) and high (≥1%) mutation frequencies, suggesting that the SHM process is stable over time. This contrasts with the hierarchical model proposed by Yeap et al. (33). Although our data also show that early mutations are targeted preferentially to hot-spot motifs, this is due to their higher overall mutability. Hot- and cold-spot mutations are both observed in sequences with few mutations (<1%) and appear at their expected frequencies. Comparative analysis of this mouse L chain model with SHM targeting models for human L and H chains identified both species- and chain-specific differences.

Although the mutability of classic hot- and cold-spot motifs was broadly similar in mice and humans, SHM targeting in mice was characterized by increased C/G targeting. Previous studies suggested that SHM targeting in mice and humans was similar, and that there were no chain-specific differences (9). However, the number of sequences and mutations in these studies were significantly smaller compared with this study. Although the gross patterns are similar with the previous finding, the novel observations of both species- and locus-specific patterns of mutation imply that there are unique molecular mechanisms operating and/or that the extent of activity of certain mechanisms varies across species and loci. The increased targeting of C/G (versus A/T) bases combined with the increased transition frequency specifically at C/G bases in mice is consistent with less involvement of the DNA error repair pathways normally associated with SHM relative to humans. Thus, in this mouse model, more of the AID-induced lesions appear to be resolved by a simple replication event without the involvement of base excision or mismatch repair. L and H chains display overall similar targeting patterns, but the detailed mutability patterns within each hot or cold spot displayed large discrepancy. One explanation for locus-specific targeting is that differences in the regulatory elements (e.g., promoters and enhancers) in H and L chain genes control recruitment of AID and DNA repair factors like UNG and error-prone polymerases. Each repair mechanism may have distinct preference in targeting motifs, and the shift in the contribution of repair mechanisms may be reflected in microsequence targeting specificity. Overall, these species- and chain-specific differences highlight the importance of using an SHM targeting model that is appropriately matched for the data being analyzed.

A potential limitation of this study concerns the comparison of functional versus nonfunctional sequences. In the mouse system, the total number of mutations in functional sequences was low, and thus may be more influenced by sequencing errors. However, the confirmation that SHM targeting in functional and nonfunctional sequences in human subjects was also similar supports our conclusion that data from nonfunctional sequences can be used as a model for mutation patterns in functional sequences. It is possible that the 5-mer model does not fully capture the influence of microsequence context on mutability and that targeting may be influenced by bases that are further upstream or downstream. However, extending the analysis to longer motifs (e.g., 7-mers) would require much higher sequence coverage to account for the large number of possible nucleotide combinations (e.g., 16,384 combinations of 7-mers). It is also important to acknowledge that no motif-based model may be sufficient to capture the full complexity of SHM targeting, which may be influenced uniquely by each V gene context. However, the observation that the same conclusions hold when running the analysis independently on each V gene family suggests that this is unlikely. There is also a caveat in the comparison between species. The human samples in the study were from a diverse population, whereas the mouse models were based on a single inbred laboratory strain. It is possible that the patterns observed are specific to this strain. Finally, it is important to note that the sequencing error models were built from sequences with identical UIDs, which captures the PCR and sequencing errors, but not errors introduced in the reverse transcription step prior to the UID-tagging process.

The models developed in this study quantitatively characterize SHM targeting without the biasing influence of selection pressures. In addition to revealing previously unrecognized hot and cold spots as well as the influence of species and locus, these models provide important background distributions for statistical analysis of affinity maturation in experimentally derived Ig sequencing data. Specifically, we developed two new models: the mouse RS5NF L chain and the human S5F L chain models. These models complement the previously published human S5F H chain model (12) and are available in the SHazaM R package (24) (http://shazam.readthedocs.io/).

We thank Dr. Gur Yaari and Dr. Mohamed Uduman for feedback and technical assistance and the Yale University Biomedical High Performance Computing Center for use of computing resources.

This work was supported by the National Institutes of Health (Grants R03AI092379 and R01AI104739 to S.H.K., Grant R01AI43603 to M.J.S.), the Myasthenia Gravis Foundation of America, the National Institute of Allergy and Infectious Diseases (Awards R01AI114780 and U19 AI056363 to K.C.O.), the Natural Sciences and Engineering Research Council of Canada (NSERC PGS-M to A.C.), and the National Library of Medicine (Grant T15LM07056 to J.A.V.H.). The Yale University Biomedical High Performance Computing Center is supported by National Institutes of Health Grants RR19895 and RR029676-01.

The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

The online version of this article contains supplemental material.

Abbreviations used in this article:

GC

germinal center

NP

(4-hydroxy-3-nitrophenyl)acetyl

RS

replacement and silent

RS5NF

RS, 5-mer, nonfunctional

S5F

silent, 5-mer, functional

S5NF

silent, 5-mer, nonfunctional

SHM

somatic hypermutation

UID

unique identifier

UNG

uracil-DNA glycosylase.

1
McKean
D.
,
Huppi
K.
,
Bell
M.
,
Staudt
L.
,
Gerhard
W.
,
Weigert
M.
.
1984
.
Generation of antibody diversity in the immune response of BALB/c mice to influenza virus hemagglutinin.
Proc. Natl. Acad. Sci. USA
81
:
3180
3184
.
2
Peled
J. U.
,
Kuang
F. L.
,
Iglesias-Ussel
M. D.
,
Roa
S.
,
Kalis
S. L.
,
Goodman
M. F.
,
Scharff
M. D.
.
2008
.
The biochemistry of somatic hypermutation.
Annu. Rev. Immunol.
26
:
481
511
.
3
Rogozin
I. B.
,
Kolchanov
N. A.
.
1992
.
Somatic hypermutagenesis in immunoglobulin genes. II. Influence of neighbouring base sequences on mutagenesis.
Biochim. Biophys. Acta
1171
:
11
18
.
4
Betz
A. G.
,
Rada
C.
,
Pannell
R.
,
Milstein
C.
,
Neuberger
M. S.
.
1993
.
Passenger transgenes reveal intrinsic specificity of the antibody hypermutation mechanism: clustering, polarity, and specific hot spots.
Proc. Natl. Acad. Sci. USA
90
:
2385
2388
.
5
Pham
P.
,
Bransteitter
R.
,
Petruska
J.
,
Goodman
M. F.
.
2003
.
Processive AID-catalysed cytosine deamination on single-stranded DNA simulates somatic hypermutation.
Nature
424
:
103
107
.
6
Kenneth
H.
,
Anna
F.
,
Gerton
L.
,
Pybus
O. G.
.
2016
.
The diversity and molecular evolution of B cell receptors during infection.
Mol. Biol. Evol.
33
:
1147
1157
.
7
Betz
A. G.
,
Neuberger
M. S.
,
Milstein
C.
.
1993
.
Discriminating intrinsic and antigen-selected mutational hotspots in immunoglobulin V genes.
Immunol. Today
14
:
405
411
.
8
Yaari
G.
,
Uduman
M.
,
Kleinstein
S. H.
.
2012
.
Quantifying selection in high-throughput Immunoglobulin sequencing data sets.
Nucleic Acids Res.
40
:
e134
.
9
Shapiro
G. S.
,
Aviszus
K.
,
Ikle
D.
,
Wysocki
L. J.
.
1999
.
Predicting regional mutability in antibody V genes based solely on di- and trinucleotide sequence composition.
J. Immunol.
163
:
259
268
.
10
Cowell
L. G.
,
Kepler
T. B.
.
2000
.
The nucleotide-replacement spectrum under somatic hypermutation exhibits microsequence dependence that is strand-symmetric and distinct from that under germline mutation.
J. Immunol.
164
:
1971
1976
.
11
Shapiro
G. S.
,
Ellison
M. C.
,
Wysocki
L. J.
.
2003
.
Sequence-specific targeting of two bases on both DNA strands by the somatic hypermutation mechanism.
Mol. Immunol.
40
:
287
295
.
12
Yaari
G.
,
Vander Heiden
J. A.
,
Uduman
M.
,
Gadala-Maria
D.
,
Gupta
N.
,
Stern
J. N.
,
O’Connor
K. C.
,
Hafler
D. A.
,
Laserson
U.
,
Vigneault
F.
,
Kleinstein
S. H.
.
2013
.
Models of somatic hypermutation targeting and substitution based on synonymous mutations from high-throughput immunoglobulin sequencing data.
Front. Immunol.
4
:
358
.
13
Oprea
M.
,
Cowell
L. G.
,
Kepler
T. B.
.
2001
.
The targeting of somatic hypermutation closely resembles that of meiotic mutation.
J. Immunol.
166
:
892
899
.
14
Inlay
M. A.
,
Gao
H. H.
,
Odegard
V. H.
,
Lin
T.
,
Schatz
D. G.
,
Xu
Y.
.
2006
.
Roles of the Ig kappa light chain intronic and 3′ enhancers in Igk somatic hypermutation.
J. Immunol.
177
:
1146
1151
.
15
Rouaud
P.
,
Vincent-Fabert
C.
,
Saintamand
A.
,
Fiancette
R.
,
Marquet
M.
,
Robert
I.
,
Reina-San-Martin
B.
,
Pinaud
E.
,
Cogné
M.
,
Denizot
Y.
.
2013
.
The IgH 3′ regulatory region controls somatic hypermutation in germinal center B cells.
J. Exp. Med.
210
:
1501
1507
.
16
Buerstedde
J.-M.
,
Alinikula
J.
,
Arakawa
H.
,
McDonald
J. J.
,
Schatz
D. G.
.
2014
.
Targeting of somatic hypermutation by immunoglobulin enhancer and enhancer-like sequences.
PLoS Biol.
12
:
e1001831
.
17
Betz
A. G.
,
Milstein
C.
,
González-Fernández
A.
,
Pannell
R.
,
Larson
T.
,
Neuberger
M. S.
.
1994
.
Elements regulating somatic hypermutation of an immunoglobulin kappa gene: critical role for the intron enhancer/matrix attachment region.
Cell
77
:
239
248
.
18
Chan
O. T.
,
Hannum
L. G.
,
Haberman
A. M.
,
Madaio
M. P.
,
Shlomchik
M. J.
.
1999
.
A novel mouse with B cells but lacking serum antibody reveals an antibody-independent role for B cells in murine lupus.
J. Exp. Med.
189
:
1639
1648
.
19
Tsioris
K.
,
Gupta
N. T.
,
Ogunniyi
A. O.
,
Zimnisky
R. M.
,
Qian
F.
,
Yao
Y.
,
Wang
X.
,
Stern
J. N.
,
Chari
R.
,
Briggs
A. W.
, et al
.
2015
.
Neutralizing antibodies against West Nile virus identified directly from human B cells by single-cell analysis and next generation sequencing.
Integr Biol (Camb)
7
:
1587
1597
.
20
Stern
J. N.
,
Yaari
G.
,
Vander Heiden
J. A.
,
Church
G.
,
Donahue
W. F.
,
Hintzen
R. Q.
,
Huttner
A. J.
,
Laman
J. D.
,
Nagra
R. M.
,
Nylander
A.
, et al
.
2014
.
B cells populating the multiple sclerosis brain mature in the draining cervical lymph nodes.
Sci. Transl. Med.
6
:
248ra107
.
21
Di Niro
R.
,
Lee
S.-J.
,
Vander Heiden
J. A.
,
Elsner
R. A.
,
Trivedi
N.
,
Bannock
J. M.
,
Gupta
N. T.
,
Kleinstein
S. H.
,
Vigneault
F.
,
Gilbert
T. J.
, et al
.
2015
.
Salmonella infection drives promiscuous B cell activation followed by extrafollicular affinity maturation.
Immunity
43
:
120
131
.
22
Vander Heiden
J. A.
,
Yaari
G.
,
Uduman
M.
,
Stern
J. N.
,
O’Connor
K. C.
,
Hafler
D. A.
,
Vigneault
F.
,
Kleinstein
S. H.
.
2014
.
pRESTO: a toolkit for processing high-throughput sequencing raw reads of lymphocyte receptor repertoires.
Bioinformatics
30
:
1930
1932
.
23
Alamyar
E.
,
Giudicelli
V.
,
Li
S.
,
Duroux
P.
,
Lefranc
M.-P.
.
2012
.
HighV-QUEST: the IMGT web portal for immunoglobulin (IG) or antibody and T cell receptor (TR) analysis from NGS high throughput and deep sequencing.
Immunome Res.
8
:
26
.
24
Gupta
N. T.
,
Vander Heiden
J. A.
,
Uduman
M.
,
Gadala-Maria
D.
,
Yaari
G.
,
Kleinstein
S. H.
.
2015
.
Change-O: a toolkit for analyzing large-scale B cell immunoglobulin repertoire sequencing data.
Bioinformatics
31
:
3356
3358
.
25
Smith
D. S.
,
Creadon
G.
,
Jena
P. K.
,
Portanova
J. P.
,
Kotzin
B. L.
,
Wysocki
L. J.
.
1996
.
Di- and trinucleotide target preferences of somatic mutagenesis in normal and autoreactive B cells.
J. Immunol.
156
:
2642
2652
.
26
Benichou
J.
,
Ben-Hamo
R.
,
Louzoun
Y.
,
Efroni
S.
.
2012
.
Rep-Seq: uncovering the immunological repertoire through next-generation sequencing.
Immunology
135
:
183
191
.
27
Hieter
P. A.
,
Korsmeyer
S. J.
,
Waldmann
T. A.
,
Leder
P.
.
1981
.
Human immunoglobulin κ light-chain genes are deleted or rearranged in λ-producing B cells.
Nature
290
:
368
372
.
28
Jacob
J.
,
Kassir
R.
,
Kelsoe
G.
.
1991
.
In situ studies of the primary immune response to (4-hydroxy-3-nitrophenyl)acetyl. I. The architecture and dynamics of responding cell populations.
J. Exp. Med.
173
:
1165
1175
.
29
Diaw
L.
,
Siwarski
D.
,
DuBois
W.
,
Jones
G.
,
Huppi
K.
.
2000
.
Double producers of kappa and lambda define a subset of B cells in mouse plasmacytomas.
Mol. Immunol.
37
:
775
781
.
30
Dörner
T.
,
Brezinschek
H. P.
,
Brezinschek
R. I.
,
Foster
S. J.
,
Domiati-Saad
R.
,
Lipsky
P. E.
.
1997
.
Analysis of the frequency and pattern of somatic mutations within nonproductively rearranged human variable heavy chain genes.
J. Immunol.
158
:
2779
2789
.
31
Yélamos
J.
,
Klix
N.
,
Goyenechea
B.
,
Lozano
F.
,
Chui
Y. L.
,
González Fernández
A.
,
Pannell
R.
,
Neuberger
M. S.
,
Milstein
C.
.
1995
.
Targeting of non-Ig sequences in place of the V segment by somatic hypermutation.
Nature
376
:
225
229
.
32
Golding
G. B.
,
Gearhart
P. J.
,
Glickman
B. W.
.
1987
.
Patterns of somatic mutations in immunoglobulin variable genes.
Genetics
115
:
169
176
.
33
Yeap
L. S.
,
Hwang
J. K.
,
Du
Z.
,
Meyers
R. M.
,
Meng
F. L.
,
Jakubauskaitė
A.
,
Liu
M.
,
Mani
V.
,
Neuberg
D.
,
Kepler
T. B.
, et al
.
2015
.
Sequence-intrinsic mechanisms that target AID mutational outcomes on antibody genes.
Cell
163
:
1124
1137
.

The authors have no financial conflicts of interest.

Supplementary data