DNA polymerase (pol) η participates in hypermutation of A:T bases in Ig genes because humans deficient for the polymerase have fewer substitutions of these bases. To determine whether polymerase η is also responsible for the well-known preference for mutations of A vs T on the nontranscribed strand, we sequenced variable regions from three patients with xeroderma pigmentosum variant (XP-V) disease, who lack polymerase η. The frequency of mutations in the intronic region downstream of rearranged JH4 gene segments was similar between XP-V and control clones; however, there were fewer mutations of A:T bases and correspondingly more substitutions of C:G bases in the XP-V clones (p < 10−7). There was significantly less of a bias for mutations of A compared with T nucleotides in the XP-V clones compared with control clones, whereas the frequencies for mutations of C and G were identical in both groups. An analysis of mutations in the WA sequence motif suggests that polymerase η generates more mutations of A than T on the nontranscribed strand. This in vivo data from polymerase η-deficient B cells correlates well with the in vitro specificity of the enzyme. Because polymerase η inserts more mutations opposite template T than template A, it would generate more substitutions of A on the newly synthesized strand.
Somatic hypermutation in variable and switch regions of Ig genes produces a high frequency of mutations of all four nucleotides. Hypermutation is initiated by the activation-induced cytidine deaminase (AID)3 protein (1), which deaminates C to U in ssDNA (2, 3, 4, 5, 6, 7, 8). The U lesion can generate mutation in two phases, as initially described by Neuberger and colleagues (9). In the first phase, U could remain in the DNA or be removed by uracil-DNA glycosylase (UNG), and a DNA polymerase (pol) would insert mutations opposite U or the UNG-generated abasic site (9, 10). This would produce mutations of C bases or of G bases if C is deaminated on the complementary strand. Mice and humans deficient for UNG have fewer transversions of C:G bases, consistent with the absence of abasic sites (11, 12). In the second phase, U could be recognized as a U:G mismatch by components of the mismatch repair system (13, 14). This would generate a repair patch, which then can be filled in by a low-fidelity pol to produce mutations opposite neighboring A and T bases. Mice deficient for mismatch repair proteins MSH2 (15, 16, 17, 18), MSH6 (19, 20, 21), and exonuclease 1 (22) have fewer mutations of A:T, indicating that they are involved in error-prone patch repair.
A closer look at the frequency of mutations of all four nucleotides reveals an anomaly that has been a major question in the hypermutation field for years. Humans and mice have ∼25% mutations of C, 25% of G, 34% of A, and 16% of T in variable regions, as recorded from the nontranscribed strand (23). The equal frequency of C and G mutations suggests that C is deaminated on both strands during phase 1. However, the unequal frequency of A and T mutations suggests that there is a bias for generating these mutations on only one strand during phase 2 (24, 25). One way to introduce strand polarity is during transcription, when the nontranscribed strand is single stranded, and the transcribed strand is complexed to mRNA. Therefore, the transcription complex may be involved in directing the phase 2 pathway to the nontranscribed strand (26). If MSH2-MSH6 and exonuclease 1 initiate a repair patch on this strand, the types of mutations would correspond to the specificity of the low-fidelity polymerase that synthesizes in the patch.
The search to identify which low-fidelity polymerases generate hypermutation has been explored intensely. Animals deficient for polymerases β, η, ι, κ, λ, and μ have been studied, and only pol η has been shown to substantially alter the spectra of nucleotide substitutions (27, 28, 29, 30, 31). We previously reported that humans with xeroderma pigmentosum variant (XP-V) disease, who are deficient for pol η (32), have fewer mutations of A:T bp in rearranged VH6 genes (28). However, Yavuz et al. (33) reported that one XP-V patient did not have fewer mutations of A:T in sequences from other rearranged genes belonging to several VH families. Recently, studies by Faili et al. (34) and Zeng et al. (35) have confirmed the role of pol η by showing a decrease of mutations of A:T in JH4 introns and switch regions from several XP-V patients.
To determine whether pol η is also involved in generating the strand bias of A vs T mutations, we sequenced introns downstream of rearranged JH4 gene segments from three XP-V patients and compared them to sequences from control individuals. The hypermutation spectrum significantly correlates with the enzymatic specificity of pol η, indicating that pol η contributes to somatic hypermutation primarily through low-fidelity synthesis of the nontranscribed strand.
Materials and Methods
Peripheral blood lymphocytes
Libraries of JH4 intronic regions
To amplify the intron downstream of rearranged JH4 gene segments, we used 5′ primers for the third framework region of the VH3-23 gene segment and 3′ primers for 320 nucleotides downstream of the JH4 gene. The following sets of nested primers were used: first set, forward, 5′-AGCCTGAGAGCCGAGGACAC-3′; reverse, 5′-GTTGTCACATTGTGACAACA-3′; and second set, forward with XbaI addition in italics, 5′-ACTCTAGACACGGCCCTATATTACTGTGC-3′; and reverse with EcoRI addition in italics, 5′ACGAATTCAACAATGCCAGGACCCCAGG-3′. Twenty nanograms of genomic DNA were amplified with Platinum Pfx polymerase and PCR enhancer (Invitrogen Life Technologies) in a 50-μl volume using the first set of primers for 30 cycles of 95°C for 30 s, 55°C for 1 min, 68°C for 1 min, followed by a final incubation at 68°C for 10 min. Nested PCR was performed with 5 μl of the first reaction and the second set of primers using the same conditions for another 30 cycles. Products were digested and cloned into pBluescript vector (Stratagene). High-efficiency JM109-competent cells (Promega) were used for transfection. The transfection mixture was spread onto antibiotic agar plates immediately after heat-shock to prevent multiplication of identical recombinant clones. Sequencing and analysis of DNA isolated from clones were performed with the BigDye Terminator Cycle Sequencing kit v3.1 (Applied Biosystems) using T3 and T7 primers (Promega) and the Applied Biosystems 310 Genetic Analyzer. The mutation data are available upon request from I. Rogozin (firstname.lastname@example.org).
The Fisher exact test was used to compare frequencies of substitutions in A and T sites. A Monte Carlo modification of the Pearson χ2 test of spectra homogeneity (37) and the Kendall’s τ correlation coefficient (38, 39) were used to compare distribution of mutations along the intron sequence. Calculations were done using the programs CORR12 (38) and COLLAPSE (40).
Nucleotide sequence features can be correlated with a mutation spectrum, and the correlation can be tested for statistical significance. The significance of correlations between the distribution of mutable motifs and mutations along a target sequence was measured by a Monte Carlo procedure (the CONSEN program) (41, 42). This approach takes into account the frequencies of substitutions for each nucleotide, the possibility of multiple mutations in a site, and the context of the mutating sites. The Monte Carlo simulation was run with weighted sites, with the weight of a site defined as:
where Mj is the number of mutations in site j. Weights Wj were summed for all sites in the analyzed sequence resulting in the total weight W. A distribution of total weights Wrandom was calculated for 10,000 target sequences with randomly shuffled mutation spectra. Each of the resulting random mutation spectra contained the same number of mutations as the observed spectrum with the same distribution of mutations over randomly chosen sites. The distribution of Wrandom was used to calculate probability PW ≤Wrandom. This probability is equal to the fraction of random spectra in which Wrandom is the same or greater than W. Small probability values (PW ≤Wrandom ≤ 0.05) indicate a significant correlation between mutable motif and mutation frequency (41, 42).
Mutation hot spots are defined using a threshold for the number of mutations at a site. The threshold is established by analyzing the frequency distribution derived from a mutation spectrum using the CLUSTERM program (〈www.itb.cnr.it/webmutation/〉) (39, 43). Briefly, this program decomposes a mutation spectrum into several homogeneous classes of sites, with each class approximated by a Poisson distribution. Variations in mutation frequencies among sites of the same class are random by definition (mutation probability is the same for all sites within a class), but differences between classes are statistically significant. Each site has a probability P(C) to be assigned to class C. A class with the highest mutation frequency is called a hot spot class. Sites with P(Chot spot) ≥ 0.95 are defined as hot spot sites. This approach ensures that the assignment is statistically significant and robust. See Ref. 39 for a detailed discussion of this approach and problems associated with its application.
Similar frequency and location of mutations in XP-V and control clones
PBL were obtained from three patients with XP-V disease and three control patients. To analyze mutations in unselected regions near variable genes, we amplified the intron downstream of the VH3-23 gene segment joined to a D gene segment and the JH4 gene segment. Both of these V and J segments are commonly found in human Igs: 10% of rearranged genes use VH3-23 (44) and 50% use JH4 (45), so the libraries of intron sequences should be diverse. The PCR primers annealed to the third framework region of VH3-23 and to a sequence 320 bp downstream of JH4 to allow sequencing of the VDJ junction as well. Only clones with unique VDJ junctions were considered for analysis. The frequency of mutation in the 320-bp intron region is shown in Fig. 1; XP-V clones had a slightly lower frequency than control clones, but the difference was not significant. Approximately 100 mutations from each individual were identified. There was no difference in the number of insertions and deletions between the XP-V and control groups. In the XP-V clones, 96% of the mutations were nucleotide substitutions, 3% were deletions of 1–54 nt, and 1% were insertions of 1–25 nt. In the control clones, 96% of the mutations were substitutions, 3% were deletions of 1–24 nt, and 1% were insertions of 1–15 nt. The location of the substitutions is plotted in Fig. 2. Five allelic nucleotides were identified at positions 71, 229, 248, 300, and 309; mutations at these positions were excluded from additional consideration. There was no significant difference in the distribution of mutations from XP-V vs control clones.
XP-V clones have fewer mutations of A:T base pairs than control clones
We have shown previously that XP-V clones had a lower frequency of mutations of A:T base pairs in the coding sequence of the VH6 gene from three patients, including XP31BE (28). Because Yavuz et al. (33) reported that the XP31BE patient did not have fewer mutations of A:T in sequences from other rearranged genes, we re-examined the DNA from XP31BE and two other patients and sequenced a different region of the H chain, e.g., the intron region downstream of rearranged VH3-23 to JH4 genes. As shown in Table I, there were 46 A:T mutations and 250 C:G mutations for all XP-V clones and 94 A:T mutations and 174 C:G mutations for all control clones. Thus, clones from the XP-V patients clearly had fewer mutations of A:T pairs compared with their control counterparts and correspondingly more mutations of C:G pairs (p < 10−7). Individually, there were significantly fewer A:T mutations compared with the controls for XP31BE (p = 0.02, Fisher exact two-tailed test), for XP11BR (p = 0.0002), and for XP7BR (p < 10−6). The experiments of Yavuz et al. (33) differed from ours in that they combined sequences from many VH genes from one patient, whereas we examined a single region, such as the VH6 gene or the JH4 intron, from three patients. Another difference is that allelic variants of genes were not identified in Yavuz et al. (33), whereas mutations at polymorphic positions were removed from our data sets.
|Substitution .||XP11BR (84 mutb) .||XP7BR (107 mut) .||XP31BE (105 mut) .||Control 1 (102 mut) .||Control 2 (84 mut) .||Control 3 (86 mut) .|
|.||% .||% .||% .||% .||% .||% .|
|Substitution .||XP11BR (84 mutb) .||XP7BR (107 mut) .||XP31BE (105 mut) .||Control 1 (102 mut) .||Control 2 (84 mut) .||Control 3 (86 mut) .|
|.||% .||% .||% .||% .||% .||% .|
All mutations are shown from the nontranscribed strand. The sequence contains 17% A, 20% T, 31% C, and 32% G; values were corrected to represent a sequence with equal amounts of the 4 bases. Substitutions from five allelic nucleotides have been excluded from the comparison.
Pol η generates mutations of A more frequently than T on the nontranscribed strand
Hypermutation frequently occurs in WRCY (W = A or T, R = A or G, Y = C or T) and WA DNA motifs (42, 46), and the WRC motif is targeted by AID in vitro (7, 8). The frequency of mutations in these motifs and their complementary sequences is shown in Table II. For the WRC motif, there was a 3- to 4-fold excess of mutations in either C or G residues from both XP-V and control clones, compared with SYC, WYC, and SRC motifs (S = C or G). This suggests that C was targeted for mutation equally frequently on both the nontranscribed (WRC, the mutable position is underlined) and transcribed (GYW) strands. However, for the WA motif, there was a significant 2-fold excess of mutations in WA vs SA from control clones compared with XP-V clones but not for mutations in TW (Table II). This result can be explained using earlier observations about the strand specificity of pol η copying an undamaged template in vitro: WA and TW motifs were shown to be targets of pol η on the nontranscribed and transcribed strand, respectively (41, 47). Thus, the significant excess of mutations in WA motifs indicates that A on the nontranscribed strand is targeted primarily for mutation by pol η in vivo.
|Motifsa .||Increase in Mutationsb .||.|
|.||XP-V .||Control .|
|Motifsa .||Increase in Mutationsb .||.|
|.||XP-V .||Control .|
Number of mutations in mutable motifs was calculated for the underlined bases. The first and third lines list the motif on the nontranscribed strand, and the second and fourth lines list the complementary sequence on the transcribed strand.
Values listed represent the fold increase in mutations at 22 WRC, 15 GYW, 13 WA, and 16 TW sites compared with mutations at 263 other sites. Bold italicized numbers represent a significant increase in mutations at mutable motifs (PW ≤ Wrandom ≤ 0.05), as revealed by using a Monte Carlo procedure (41 42 ).
Strand bias for A mutations correlates with the specificity of pol η
Additional evidence for fewer mutations of A compared with T on the nontranscribed strand in XP-V clones becomes apparent when mutations at the four nucleotides are tabulated in Fig. 3. A significant excess of mutations in A vs T was found in the control spectra (p = 0.003, the binomial test), whereas no difference was found in the XP-V spectra (p = 0.76). To determine which mutations of A were affected in the absence of pol η, we examined the types of mutations listed in Table I. The frequency of each type of A and T mutation is plotted in Fig. 4; the category that decreased the most in the XP-V clones compared with the control clones was the A to G substitutions (p = 0.004; the Bonferroni correction for multiple binomial tests was repeated six times). The frequency of errors generated by human pol η copying an undamaged template in vitro (47) is also shown in Fig. 4. The mispair frequency is highest for A to G substitutions, which represents G being incorporated opposite template T. Thus, the data in vivo correlates with the specificity of pol η in vitro (the linear correlation coefficient CC = 0.79, PCC = 0.02; the frequencies of 12 types of substitutions generated by pol η in vitro were correlated against differences between the frequencies of 12 types of substitutions in XP-V and control clones).
Model for hypermutation
A basic model for hypermutation is proposed, based on genetic and biochemical evidence, for involvement of the following proteins. AID (1) is somehow targeted to Ig variable and switch regions during transcription (48, 49), and it deaminates C to U (2, 3, 4, 5, 6, 7, 8). UNG glycosylase (9, 10, 11, 12) removes some uracils to produce abasic sites. Other uracils remain as an U:G mispair and bind to the MSH2-MSH6 heterodimer (13, 14, 20, 21), which then recruits an unspecified endonuclease to make a nick in the DNA. MSH2 also attracts both exonuclease 1 (22) to produce a gap at the nicks and DNA pol η (28, 34, 35) to fill in the gap. MSH2-MSH6 then stimulates the catalytic activity of pol η (14), which allows it to generate mutations opposite all four bases.
However, although mutations occur at all four nucleotides, the frequency of mutation is skewed in variable genes so that there are twice as many mutations of A compared with T as recorded from the nontranscribed strand, whereas there are equal frequencies of mutations of C and G. This strand bias for A vs T mutations but not C vs G has been an enigma for many years and suggests that A:T mutations occur preferentially on one strand, whereas C:G mutations occur on both strands. Because DNA pol η participates in mutations of A:T base pairs in Ig genes, we investigated if it is also responsible for the A:T strand polarity.
Pol η generates significant mutations of A:T bases
The involvement of pol η in hypermutation was confirmed by sequencing clones containing the intronic region downstream of rearranged VH3-23 to JH4 gene segments in DNA from three XP-V patients who were deficient for pol η. Around 300 mutations were compared with a similar number obtained from three control individuals. Although the frequencies of mutation were similar between the two groups, there were significantly fewer mutations of A:T base pairs in the XP-V clones compared with control clones and correspondingly more mutations of C:G. This is consistent with previous data from several XP-V patients showing a decrease in A:T mutations in the coding sequence of VH6 genes (28), in JH4 introns from several rearranged VH genes (34), and in the μ-γ switch regions (34, 35).
Pol η synthesizes primarily on the nontranscribed strand
Mutations of A:T in Ig genes occur preferentially in the WA/TW sequence motif (41, 47). In the control clones, mutations were overrepresented in WA on the nontranscribed strand by 2-fold compared with random sequences, whereas no increase was found in the XP-V clones. However, there was no increase in the corresponding TW motif from both groups, which would represent WA on the complementary strand. A decline in substitutions of A relative to T can also be seen in the data from the μ-γ switch regions from these patients (35) and in the JH4 intron and μ switch region from two other XP-V patients (34). This suggests that pol η inserts substitutions preferentially on the nontranscribed strand.
In contrast, in the WRC/GYW motif, there was a 3- to 4-fold increase in mutations of both C and G in control and XP-V clones (Table II). This suggests that mutations of C can occur on both strands in the presence or absence or pol η. Equal frequencies of C and G mutations in variable regions suggest that AID deaminates C to U on both DNA strands (50). This may occur during transcription when both strands are single stranded at the trailing edge of the transcription bubble or possibly if they are supercoiled (51). However in switch regions, there is a preference for C mutations on the nontranscribed strand compared with G mutations (35), which may reflect the formation of stable R-loops in switch DNA (52) to expose the nontranscribed strand for deamination. The two regions also differ in that AID deamination of variable regions requires replication protein A (49), which may stabilize ssDNA, whereas deamination of switch regions does not require the cofactor (3).
Specificity of pol η correlates with A:T asymmetry
Unequal frequencies of A and T mutations suggest that either base is preferentially mutated or repaired on one DNA strand. This would occur as error-prone synthesis extends past the U lesion into neighboring A and T nucleotides. As noted above, the analysis of mutations in the WA/TW motif suggests that synthesis occurs on the nontranscribed strand (Table II and Ref. 53). Recruitment to U on the nontranscribed strand may occur via the multiprotein transcription complex, TFIIH, which directs RNA polymerase II to the transcribed strand by an unknown process. This asymmetry may also deposit the mismatch repair proteins and pol η on the opposite nontranscribed strand.
During gap filling or strand displacement of the repair patch, pol η would then synthesize DNA on the nontranscribed strand, using the transcribed strand as a template. In this case, pol η should have a strong tendency to insert a mismatched base opposite template T but not opposite templates A, G, or C. This is exactly the specificity of human and mouse pol η in vitro when copying undamaged templates (41, 47, 53). Specifically, there is a 5-fold increase in incorporation of G opposite T, which would represent an A to G substitution, compared with the other 11 possibilities (Fig. 4 and Ref. 47). In XP-V clones, the frequency of A to G mutations declined the most significantly in the mutational spectra, which is consistent with the mutational pattern of pol η. If the template is the transcribed strand, then pol η would generate A to G substitutions in excess when synthesizing a repair patch on the nontranscribed strand. Although other roles for pol η to generate A:T mutations have been suggested (54, 55), the ability of the enzyme to catalyze error-prone synthesis on DNA is the most logical explanation for its function in hypermutation.
Which polymerases produce the remaining mutations?
In the absence of pol η, the residual A:T mutations could be generated by other low-fidelity polymerases that bypass an abasic site generated by UNG glycosylase at deaminated C bases. This would produce mutations of C and G and, perhaps less efficiently, mutations of adjacent A and T through strand displacement. There would be no A:T bias if this occurred on both strands. As noted earlier, mice deficient for several other error-prone polymerases had normal frequencies of mutation of all four nucleotides, which has confounded the problem of identifying another polymerase. Because pol η appears to be a major player in hypermutation, it may compensate in the absence of other polymerases and synthesize all the mutations. When pol η-deficient mice become available, it will be possible to study mice doubly deficient for pol η and other candidate polymerases to identify which one also participates in bypassing abasic sites to generate the C:G and residual A:T mutations. Significantly, although there are some A:T mutations in mice deficient for either UNG or MSH2, there are no A:T mutations in mice deficient for both UNG and MSH2 (56). This indicates that A:T mutations can occur during both the phase 1 pathway caused by bypass of UNG-generated abasic sites and the phase 2 pathway caused by repair of MSH2-generated patches. It will be interesting to see how pol η participates in each pathway by creating mice doubly deficient for pol η and UNG vs pol η and MSH2.
We thank William Yang and Elena Vasunina for technical assistance, and Stella Martomo and Michael Seidman for thoughtful comments.
The authors have no financial conflict of interest.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
This research was supported by the National Institutes of Health Intramural Research Program, Russian Fund for Basic Research, and MedCen Foundation.
Abbreviations used in this paper: AID, activation-induced cytidine deaminase; UNG, uracil-DNA glycosylase; pol, DNA polymerase; XP-V, xeroderma pigmentosum variant.