A feature of Ig hypermutation is the presence of hypermutable DNA sequences that are preferentially found in the V regions of Ig genes. Among these, RGYW/WRCY is the most pronounced motif (G:C is a mutable position; R = A/G, Y = C/T, and W = A/T). However, a molecular basis for the high mutability of RGYW was not known until recently. The discovery that activation-induced cytidine deaminase targets the DNA encoding V regions, has enabled the analysis of its targeting properties when expressed outside of the context of hypermutation. We analyzed these data and found evidence that activation-induced cytidine deaminase is the major source of the RGYW mutable motif, but with a new twist: DGYW/WRCH (G:C is the mutable position; D = A/G/T, H = T/C/A) is a better descriptor of the Ig mutation hotspot than RGYW/WRCY. We also found evidence that a DNA repair enzyme may play a role in modifying the sequence of hypermutation hotspots.

Although diversity of the B cell repertoire in adaptive immunity is generated through V(D)J recombination, specificity to a particular foreign Ag is attained via the combined forces of Ig somatic hypermutation (SHM)2 and cellular selection for the high-affinity variants generated by the SHM machinery (1, 2). In this process, mutations are introduced into the DNA encoding the V domains of the Ig receptor at a rate close to six orders of magnitude greater than the spontaneous mutation rate (3, 4). The mutations generated by the SHM mechanism are not randomly distributed. Mutation hotspots in V regions occur primarily within two DNA sequence motifs: RGYW/WRCY (R = A/G, Y = C/T, and W = A/T; the hotspot in G:C is underlined), which is found in both strands, and WA/TW (the hotspot in A:T is underlined), preferentially found in one strand (5, 6, 7). DNA polymerase η probably contributes to the WA hotspot: in vitro analysis of mutation spectra of various DNA polymerases revealed a correlation between the WA/TW motif and the error specificity of human DNA polymerase η (7), and xeroderma pigmentosum patients with a defect in this polymerase display fewer A-T mutations in Ig genes (8). A candidate for the predominant RGYW mutator is a targeting complex containing activation-induced cytidine deaminase (AID). Recent evidence suggests that AID deaminates C bases in DNA (9, 10, 11, 12, 13, 14, 15). Although cytosine in other contexts such as dCMP in the nucleotide pool or cytosine in RNA were originally suspected, the nucleotide pool in Escherichia coli cells expressing AID is unaltered (16), and direct evidence for an RNA target remains lacking. The best evidence that AID targets the DNA of IgV regions in vivo comes from the finding that mice deficient in uracil DNA glycosylase (ung) have a dramatic increase in transitions from G:C base pairs in IgV genes undergoing hypermutation (17). Thus, it appears that AID deaminates cytosines in the DNA of IgV regions, and this lesion triggers the full SHM process including other factors and possibly multiple DNA polymerases (7, 8, 17, 18, 19, 20).

AID can deaminate ssDNA in vitro. Goodman and colleagues (11) examined the spectra of mutations generated in vitro and found a potential new variant of the RGYW hotspot motif: WRC/GYW. This observation was supported by Lieber and colleagues (12). This contradicts earlier analysis where a highly significant correlation was found between the identity of the bases at the first/last positions of the RGYW/WRCY motif and mutability, particularly in Ig loci (5, 6, 7, 21, 22, 23, 24). Strikingly, both groups observed a high frequency of C:G>T:A mutations in WRCG/CGYW motifs (11, 12). Goodman and colleagues (11) reasonably attributed these differences from the RGYW motif to their assay target being a prokaryotic gene (lacZ). Because of these discrepancies, we re-examined the mutability of the RGYW motif and its potential variants using both new and old in vivo SHM data, and noting that context differences between eukaryotic and prokaryotic genes may influence the pattern of SHM.

Seven spectra of somatic mutations in different pro- and eukaryotic genes described by Milstein et al. (6) were analyzed. The data are available upon request from I. Rogozin ([email protected]). A sequence where mutations were revealed is referred to as a target sequence.

Mutation hotspots are defined using a threshold for the number of mutations at a site. The threshold is established by analyzing the frequency distribution derived from a mutation spectrum using CLUSTERM program (www.itb.cnr.it/webmutation/) (25, 26). CLUSTERM identifies several homogeneous classes of sites from a mutation spectrum. Each class of sites is approximated by a binomial (or Poisson) distribution. The probability of mutation is assumed to be the same for all sites in a class, so variation in mutation frequency for sites of the same class is assumed random and not statistically significant. In contrast, statistically significant differences in mutation frequency for sites from different classes reflect mutation hotspots. A class with the highest mutation frequency is called a hotspot class. A hotspot site is defined as a permanent member of the hotspot class, meaning that this site has a ≥0.95 probability of being assigned to the hotspot class. This guarantees that the assignment is statistically significant and, as a result, robust (26). See Rogozin et al. (26, 27) for detailed discussion of this approach and problems associated with its application.

Nucleotide sequence features can be correlated with a mutation spectrum and the correlation can be tested for statistical significance. The significance of correlations between the distribution of mutable motifs and mutations along a target sequence was measured by a Monte Carlo procedure (the CONSEN program) (5). This approach takes into account frequency of substitutions in A, T, G, and C bases, the presence of several mutations in a site, and the nucleotide sequence of the target sequence. Weight Wj of site j is defined as the number of substitutions in a mutable motif, the total weight W = Σ Wj. A distribution of statistical weights Wrandom was calculated for 10,000 randomly shuffled mutation spectra. Each of the spectra contained the observed number of mutations distributed similarly in all sites. The distribution in Wrandom was used to calculate probability PWWrandom. This probability is equal to the fraction of groups of random mutations in which Wrandom is the same or higher than W. Small probability values (PWWrandom ≤ 0.05) indicate a significant correlation between mutable motif and mutation frequency (5, 7, 27).

To determine the mutability of the CGYW motif, we analyzed seven mutation spectra in various Ig and non-Ig genes, in hypermutating B cells. The numbers of mutations in GGYW and TGYW are similar, but the average number of mutations was much higher in AGYW (Table I). Strikingly, and in contradiction to the in vitro data, the average number of mutations in the CGYW motif was significantly lower (Table I). These combined data indicate that a key motif of Ig hypermutation is DGYW (D = A/G/T). Because the GGYW motif displays a higher frequency of mutations than the TGYW motif in V regions (e.g., in the Vh26 gene; Table I), the RGYW may be adequate for IgV gene targets. However, DGYW is a far better descriptor of the intrinsic targeting of the SHM machinery (Table I).

Table I.

Average number of mutations in variants of GYW/WRCa

GeneΣ
gptneoGlobinVJλ3 intronJH4VκOx1Jκ5VH26
AGYW/WRC11.0+ 1.6+ 5.3+ 2.7+ 1.9+ 10.1+ 10.3+ 42.9 
GGYW/WRC3.3 1.1+ 2.1+ 2.1+ 0.8 6.8+ 4.8+ 20.8 
TGYW/WRC3.8 0.4 1.9 1.5+ 1.5+ 6.0+ 3.1 18.4 
CGYW/WRC1.8 0.7 0.0 0.0 0.0 3.0 3.8 9.3 
GeneΣ
gptneoGlobinVJλ3 intronJH4VκOx1Jκ5VH26
AGYW/WRC11.0+ 1.6+ 5.3+ 2.7+ 1.9+ 10.1+ 10.3+ 42.9 
GGYW/WRC3.3 1.1+ 2.1+ 2.1+ 0.8 6.8+ 4.8+ 20.8 
TGYW/WRC3.8 0.4 1.9 1.5+ 1.5+ 6.0+ 3.1 18.4 
CGYW/WRC1.8 0.7 0.0 0.0 0.0 3.0 3.8 9.3 
a

Numbers of mutations were calculated only at the underlined G:C bases. Cytosines in both DNA strands are probable targets of AID. +, Indicates a significant excess of mutations in a mutable motif (in other words, significant correlation (PWWrandom < 0.05) between motifs and somatic mutations). The significance of correlations was measured with a Monte Carlo procedure (the CONSEN program). CGYW/WRCG motifs were found in all studied spectra.

Context analysis of somatic mutation hotspots predicted by the CLUSTERM program (25) also suggests that DGYW is the best mutable motif instead of the widely accepted RGYW motif. We found 35 AGYW, 12 GGYW, and 17 TGYW hotspots in this analysis. However, no mutation hotspots were found in the CGYW motif. Although there were fewer CGYW target motifs (28 sites) than DGYW motifs (295 sites), the lack of mutations at CGYW motifs could not be attributed to its lesser prevalence, because, given the number of sites, attaining such mutation frequencies at random was unlikely (0 of 64 vs 28 of 295; p = 0.008 by the Fisher exact test). This result was confirmed using the algorithm of the CONSEN program (5) wherein no significant targeting of somatic mutations was found within CGYW (Table I), whereas AGYW, GGYW, and TGYW displayed a significant correlation (PWWrandom < 0.05) in the studied spectra (Table I).

In the seven analyzed in vivo spectra, the most mutable individual sequences were the following: AGCT (the gpt, neo, and Vh26 spectra), AGCA (globin), AGTA (λ intron), TGTT (Jh4), and AGTT (VkOx1) (Table II). Thus, six AGYW/WGCT and one TGYW/WGCA motifs were found in vivo. In the in vitro study, wherein AID intrinsic targeting was tested, the two most mutable sequences were TACG/CGTA and TGCT/AGCA (11) (Table II), indicating that the CGYW/WGCG motif (TACG/CGTA) is highly mutable in vitro. Again, such exceptionally high mutability of CGYW/WGCG motifs has never been observed in SHM spectra examined so far (5, 6, 7, 21, 22, 23, 24). In fact, even in the case wherein unique components of the Ig hypermutation machinery are not expected to be present (i.e., fibroblasts), DGYW remained the hotspot motif, clearly suggesting that AID alone targets these motifs (28) (Table II).

Table II.

CGYW is a coldspot in vivo but a hotspot in vitroa

MotifVh26VκOx1Jκ5GFP in FibroblastslacZ In Vitro
SiteNo. of mutationsSiteNo. of mutationsSiteNo. of mutationsSiteNo. of mutations
AGYW AGCT 28 AGTT 28 AGCT 10 AGCA 34 
GGYW GGTA 17 GGTA 13 GGCA GGTT 31 
TGYW TGCT 11 TGCA 17 TGCT TGCA 31 
CGYW CGCT CGCT CGCA CGTA 34 
MotifVh26VκOx1Jκ5GFP in FibroblastslacZ In Vitro
SiteNo. of mutationsSiteNo. of mutationsSiteNo. of mutationsSiteNo. of mutations
AGYW AGCT 28 AGTT 28 AGCT 10 AGCA 34 
GGYW GGTA 17 GGTA 13 GGCA GGTT 31 
TGYW TGCT 11 TGCA 17 TGCT TGCA 31 
CGYW CGCT CGCT CGCA CGTA 34 
a

The polarity (strand) of all sequences was chosen such that the potentially mutable base (underlined) is a G. A motif with the highest frequency of mutations was chosen for each variant of NGYW (N = A/T/G/C). The highest ratio of mutation frequencies at AGYW vs CGYW was observed in the Vh26 spectrum among all studies of in vivo spectra (28 mutations vs 9 mutations). GFP, Green fluorescent protein.

Why is CGYW/WRCG mutable in vitro but not in vivo? We propose that AID targets WRC/GYW in vivo. Given the fact that these simple motifs are probably found throughout the genome, it is likely that AID is targeted to IgV regions by a cofactor, and it locally either scans for, or has highest affinity to the WRC/GYW motif. We further propose that when AID targets C in the WRCG motif, an unknown DNA repair enzyme correctly repairs the lesion. Because this motif contains the CG dinucleotide characteristic of CpG islands, and because CG motifs are known to be highly mutable in the vertebrate genome (29), it is likely that many repair enzymes have evolved to repair lesions at CG dinucleotides, including those originating from cytosine deamination. Consistent with this idea, AID expression in fibroblasts is also characterized by the absence of mutation at the frequently found CGYW (WRCG) motif of the green fluorescent protein locus target (28) (Table II). Because Ig hypermutation does not occur in fibroblasts, the paucity of mutations at CGYW/WRCG in these cells further indicates that the putative repair enzyme is ubiquitous. Armed with this information, we examined the possibility that the enzyme is ung by analyzing the ung−/− spectra (17). Indeed we found 2.6 more mutations at CGs in the ung−/− spectrum, suggesting a role for this glycosylase in modifying AID’s intrinsic targeting characteristics. However, the idea that ung is the enzyme processing the deaminated cytosine in the WRCG/CGYW motif is speculative without direct experimental evidence, and thus, it remains possible that other DNA repair enzymes are involved. Thus, we propose that AID targets WRCN (N = A/G/C/T), but when N = G, a DNA repair enzyme, perhaps ung, efficiently processes the lesion and thus eliminates mutations from WRCG/CGYW motifs in vivo. The combined action of AID and removal of mutations from the WRCG/CGYW motif leaves us with the newly refined AID-driven SHM motif of DGYW/WRCH.

We thank Jan Drake, Eugene Koonin, and Tom Kunkel for comments on the manuscript. Also, we thank Youri Pavlov and Claude-Agnes Reynaud for stimulating conversations about SHM and potential variants of the RGYW motifs.

2

Abbreviations used in this paper: SHM, somatic hypermutation; AID, activation-induced cytidine deaminase; ung, uracil DNA glycosylase.

1
Weigert, M. G., I. M. Cesari, S. J. Yonkovich, M. Cohn.
1970
. Variability in the light chain sequences of mouse antibody.
Nature
228
:
1045
.
2
Crews, S., J. Griffin, C. K. Huang, L. Hood.
1981
. A single VH gene segment encodes the immune response to phosphorylcholine: somatic mutation is correlated with the class of the antibody.
Cell
25
:
59
.
3
Clarke, S. H., K. Huppi, D. Ruezinsky, L. Staudt, W. Gerhard, M. Weigert.
1985
. Inter- and intraclonal diversity in the antibody response to influenza hemagglutinin.
J. Exp. Med.
161
:
687
.
4
Chien, N. C., R. R. Pollock, C. Desaymard, M. D. Scharff.
1988
. Point mutations cause the somatic diversification of IgM and IgG2a antiphosphorylcholine antibodies.
J. Exp. Med.
167
:
954
.
5
Rogozin, I. B., N. A. Kolchanov.
1992
. Somatic hypermutagenesis in immunoglobulin genes. II. Influence of neighbouring base sequences on mutagenesis.
Biochim. Biophys. Acta
1171
:
11
.
6
Milstein, C., M. S. Neuberger, R. Staden.
1998
. Both DNA strands of antibody genes are hypermutation targets.
Proc. Natl. Acad. Sci. USA
95
:
8791
.
7
Rogozin, I. B., Y. I. Pavlov, K. Bebenek, T. Matsuda, T. A. Kunkel.
2001
. Somatic mutation hotspots correlate with DNA polymerase η error spectrum.
Nat. Immunol.
2
:
530
.
8
Zeng, X., D. B. Winter, C. Kasmer, K. H. Kraemer, A. R. Lehmann, P. J. Gearhart.
2001
. DNA polymerase η is an A-T mutator in somatic hypermutation of immunoglobulin variable genes.
Nat. Immunol.
2
:
537
.
9
Petersen-Mahrt, S. K., R. S. Harris, M. S. Neuberger.
2002
. AID mutates E. coli suggesting a DNA deamination mechanism for antibody diversification.
Nature
418
:
99
.
10
Chaudhuri, J., M. Tian, C. Khuong, K. Chua, E. Pinaud, F. W. Alt.
2003
. Transcription-targeted DNA deamination by the AID antibody diversification enzyme.
Nature
422
:
726
.
11
Pham, P., R. Bransteitter, J. Petruska, M. F. Goodman.
2003
. Processive AID-catalysed cytosine deamination on single-stranded DNA simulates somatic hypermutation.
Nature
424
:
103
.
12
Yu, K., F.-T. Huang, and M. R. Lieber. DNA substrate length and surrounding sequence affect the activation induced deaminase activity at cytidine. J. Biol. Chem. In press.
13
Sohail, A., J. Klapacz, M. Samaranayake, A. Ullah, A. S. Bhagwat.
2003
. Human activation-induced cytidine deaminase causes transcription-dependent, strand-biased C to U deaminations.
Nucleic Acids Res.
31
:
2990
.
14
Dickerson, S. K., E. Market, E. Besmer, F. N. Papavasiliou.
2003
. AID mediates hypermutation by deaminating single stranded DNA.
J. Exp. Med.
197
:
1291
.
15
Li, Z., C. J. Woo, M. D. Scharff.
2003
. Mutations in AID and UNG extend the function of AID.
Nat. Immunol.
4
:
945
.
16
Diaz, M., M. Ray, L. J. Wheeler, L. K. Verkoczy, C. K. Mathews.
2003
. Mutagenesis by AID, a molecule critical to immunoglobulin hypermutation, is not caused by an alteration of the precursor nucleotide pool.
Mol. Immunol.
40
:
261
.
17
Rada, C., G. T. Williams, H. Nilsen, D. E. Barnes, T. Lindahl, M. S. Neuberger.
2002
. Immunoglobulin isotype switching is inhibited and somatic hypermutation perturbed in UNG-deficient mice.
Curr. Biol.
12
:
1748
.
18
Zan, H., A. Komori, Z. Li, A. Cerutti, A. Schaffer, M. F. Flajnik, M. Diaz, P. Casali.
2001
. The translesion DNA polymerase ζ plays a major role in Ig and bcl-6 somatic hypermutation.
Immunity
14
:
643
.
19
Diaz, M., L. K. Verkoczy, M. F. Flajnik, N. R. Klinman.
2001
. Decreased frequency of somatic hypermutation and impaired affinity maturation but intact germinal center formation in mice expressing antisense RNA to DNA polymerase ζ.
J. Immunol.
167
:
327
.
20
Faili, A., S. Aoufouchi, E. Flatter, Q. Gueranger, C. A. Reynaud, J. C. Weill.
2002
. Induction of somatic hypermutation in immunoglobulin genes is dependent on DNA polymerase ι.
Nature
419
:
944
.
21
Spencer, J., M. Dunn, D. K. Dunn-Walters.
1999
. Characteristics of sequences around individual nucleotide substitutions in IgVH genes suggest different GC and AT mutators.
J. Immunol.
162
:
6596
.
22
Foster, S. J., T. Dorner, P. E. Lipsky.
1999
. Somatic hypermutation of VκJκ rearrangements: targeting of RGYW motifs on both DNA strands and preferential selection of mutated codons within RGYW motifs.
Eur. J. Immunol.
29
:
4011
.
23
Oprea, M., L. G. Cowell, T. B. Kepler.
2001
. The targeting of somatic hypermutation closely resembles that of meiotic mutation.
J. Immunol.
166
:
892
.
24
Shapiro, G. S., M. C. Ellison, L. J. Wysocki.
2003
. Sequence-specific targeting of two bases on both DNA strands by the somatic hypermutation mechanism.
Mol. Immunol.
40
:
287
.
25
Glazko, G. V., L. Milanesi, I. B. Rogozin.
1998
. The subclass approach for mutational spectrum analysis: application of the SEM algorithm.
J. Theor. Biol.
192
:
475
.
26
Rogozin, I. B., F. A. Kondrashov, G. V. Glazko.
2001
. Use of mutation spectra analysis software.
Hum. Mutat.
17
:
83
.
27
Rogozin, I. B., Y. I. Pavlov.
2003
. Theoretical analysis of mutation hotspots and their DNA sequence context specificity.
Mutat. Res.
544
:
65
.
28
Yoshikawa, K., I. M. Okazaki, T. Eto, K. Kinoshita, M. Muramatsu, H. Nagaoka, T. Honjo.
2002
. AID enzyme-induced hypermutation in an actively transcribed gene in fibroblasts.
Science
296
:
2033
.
29
Cooper, D. N., H. Youssoufian.
1988
. The CpG dinucleotide and human genetic disease.
Hum. Genet.
78
:
151
.