Abstract
Secondary Ig gene diversification relies on activation-induced cytidine deaminase (AID) to create U:G mismatches that are subsequently fixed by mutagenic repair pathways. AID activity is focused to Ig loci by cis-regulatory DNA sequences named targeting elements. In this study, we show that in contrast to prevailing thought in the field, the targeting elements in the chicken IGL locus are distinct from classical transcriptional enhancers. These mutational enhancer elements (MEEs) are required over and above transcription to recruit AID-mediated mutagenesis to Ig loci. We identified a small 222-bp fragment in the chicken IGL locus that enhances mutagenesis without boosting transcription, and this sequence represents a key component of an MEE. Lastly, MEEs are evolutionarily conserved among birds, both in sequence and function, and contain several highly conserved sequence modules that are likely involved in recruiting trans-acting targeting factors. We propose that MEEs represent a novel class of cis-regulatory elements for which the function is to control genomic integrity.
Activation-induced cytidine deaminase (AID) is a DNA mutator enzyme that initiates the secondary Ab gene diversification processes somatic hypermutation (SHM), Ig gene conversion (GCV), and class-switch recombination (CSR) (1–3). AID converts cytosines to uracils in the context of ssDNA that is generated during transcription, and the resulting U:G mismatches are fixed by direct replication or repaired by error-prone DNA repair pathways (4, 5). SHM and GCV are closely related and alter the nucleotide sequence in the VJ (or VDJ) exon of Ig genes, thus modifying Ab specificity. In contrast, CSR leaves the Ag-binding sites unaltered and swaps the C region from the Cμ isotype to Cγ, Cε, or Cα, altering the effector function of the encoded Ab.
Although these processes are restricted to Ig genes, and in the case of CSR specifically to the IgH (IGH) locus, low levels of AID activity have also been reported for a much larger range of genes (6–8). Such mistargeting can lead to point mutations (e.g., in BCL6) or translocations (C-MYC/IGH) and likely represent a key event in the formation of B cell lymphomas characterized by such genetic alterations (9–12). The very robust targeting of AID-mediated sequence diversification processes to Ig loci has been the focus of intense investigation over almost two decades and is thought to be mediated by cis-regulatory DNA sequences, in particular transcription enhancers (13, 14). We recently identified the first, to our knowledge, such targeting element in the Ig L chain (IGL) genes of chicken DT40 B cells (15). This cis-regulatory sequence resides within a 4-kb region downstream of a transcriptional enhancer element, and this region is now referred to as the chicken IGL 3′-regulatory region (3′RR). Two independent research groups confirmed our findings and recently reported large noncoding DNA elements with targeting activity that overlap with our 3′RR and are referred to as diversification activator (16) or Region A (17), respectively. However, several outstanding questions remained: 1) the identity of the minimal DNA sequence critical for targeting; 2) the trans-acting factors binding to this sequence and their mode of action; and 3) whether transcriptional enhancers and mutational enhancers are unified in the very same DNA elements.
In this study, we provide first definitive evidence, to our knowledge, that distinct mutational enhancer elements (MEEs) are required over and above transcriptional enhancer activity to recruit AID-mediated sequence diversification to the chicken IGL locus. We show that two independent and redundant MEEs exist in this locus. In addition, we defined the location of one of them to within 1.5 kb and have experimental evidence that a critical part of the MEE resides within a small 222-bp region. Lastly, we demonstrate that the targeting activity is evolutionarily conserved, as an orthologous fragment from zebra finch (Taeniopygia guttata) is functional when placed into the IGL locus of DT40 cells. Sequence alignments suggest that small modules highly conserved during evolution provide the platform on which trans-acting targeting factors assemble to mediate SHM/GCV targeting.
Materials and Methods
Oligonucleotides
The sequences of all oligonucleotides used in this study are listed in Table I.
Luciferase Constructs (See Fig. 1) . | Oligonucleotide Sequences (5′–3′) . |
---|---|
IgL promoter | |
PLF | GACTCTCGAGTGGGAAATACTGGTGATAGG |
PLRH | GACTAAGCTTGGCGGAATCCCAGCAGCTGTGTGTC |
Enh (467-bp enhancer) | |
CHEFB1 | GACTAGATCTCAGCTGGGGCCACACAAAGAG |
CHERB1 | GACTGGATCCCTGGAAGCAGGCAGGAGTCGTG |
Fragment 1 | |
KR1F | GTCAGGATCCCCCGGCAAGTGGCGGCTGCT |
KR1R | GTCAAGATCTACCCACAGCTGGCCGTGGCATC |
Fragment 2 | |
KR2F | GTCAGGATCCTCCGCCACAACCGCTCCGCA |
KR2R | GTCAAGATCTAGCGTCCTGCTGGACAGCAGGC |
Fragment 3 | |
KR3F | GTCAGGATCCGCCCACTCTCATTGCGGTGCT |
KR3R | GTCAAGATCTAAATGTACGCAGCCCAGGAG |
Fragment 4 | |
KR4F | GTCAGGATCCCACGGCACAAAGGTGTTTAT |
KR4R | GTCAAGATCTATGAGGATCCGCTTTGCTAATGAGCAGAAG |
Fragment 5 | |
KR5F | GTCAGGATCCGTGCCCTGGTGCTCTGCAAT |
KR5R | GTCAAGATCTTCCAGTCTGCAGCGTGTGCAT |
Fragment 6 | |
KR6F | GTCAGGATCCTGTGGTCACTGCTGGGCTCT |
KR6R | GTCAAGATCTAAGCAGAGCCAGGAGCAGGA |
Fragment 7 | |
KR7F | GTCAGGATCCTGGCTGCGGTCAGCACATCT |
KR7R | GTCAAGATCTACTGTGGGCAGCAGGCTGAA |
Fragment A | |
KR7.2F | GACTGGATCCGAGACACGGATGGAGCAGTGTG |
KR7R | GTCAAGATCTACTGTGGGCAGCAGGCTGAA |
Fragment B | |
KR4F | GTCAGGATCCCACGGCACAAAGGTGTTTAT |
KR7R | GTCAAGATCTACTGTGGGCAGCAGGCTGAA |
Fragment C | |
KR7.1F | GACTGGATCCGCTCCGAGGCCACAAGCCCT |
KR7R | GTCAAGATCTACTGTGGGCAGCAGGCTGAA |
Fragment D | |
KR7.1F | GACTGGATCCGCTCCGAGGCCACAAGCCCT |
KR7.2R | GACTAGATCTCACACTGCTCCATCCGTGTCTC |
Fragment E | |
KR7F | GTCAGGATCCTGGCTGCGGTCAGCACATCT |
KR3R | GTCAAGATCTAAATGTACGCAGCCCAGGAG |
Fragment F | |
KR7F | GTCAGGATCCTGGCTGCGGTCAGCACATCT |
KR7.1R | GACTAGATCTAGGGCTTGTGGCCTCGGAGC |
ZFTE fragment (from zebra finch IGL locus) | |
ZFTEF | GGATCCAAACTAATCAGTCCCTGCCTGC |
ZFTER1 | GGATCCTGTGGCTCTGCTGGGAATG |
Targeting vectors for deletion analysis | |
ΔME6.1 and ME6.1S (downstream homology arm) | |
Eraf53B | GACTGGATCCCAGCAATTCACAGAAACATTG |
Erar21X | GACTCTCGAGCAGGGCTGCAATAAAGGTGAG |
ΔME6.2 (downstream homology arm) | |
hypR3A | AATTGGATCCCAGGGAGCTCACCTTTATT |
hypF1A | GATACTCGAGTTGTATTTCCCATCCTGGTG |
Δ1.5K (upstream homology arm) | |
C9N | GACTGCGGCCGCTCAACAGATCAGCACTGGAGAC |
D3KR | GACTGGATCCGCGTGGTGGGAGCGGGCAGG |
Δ1.5K (downstream homology arm) | |
ERAF54B | GACTGGATCCCAGGCTCTGGTCCCATCTCACTG |
Erar21X | GACTCTCGAGCAGGGCTGCAATAAAGGTGAG |
Southern and Northern blot probes | |
CL probe (used for Northern blots to determine IGL transcript levels and Δ1.5K Southern blots) | |
CCLF1 | CCCACCGTCAAAGGAGGAGCTG |
CHVLR1 | CAGTAGATCTTTAGCACTCGGACCTCTTCAGG |
GAPDH probe | |
CHGAPDHF | ACCAGGGCTGCCGTCCTCTC |
CHGAPDHR | TTCTCCATGGTGGTGAAGAC |
ER probe (used for Southern blot genotyping of all clones except for Δ1.5K) | |
erar4 | AGCACAGAACAGGCACGTGCT |
QG11R | GACGTTGATGTGGACGATGTG |
Luciferase Constructs (See Fig. 1) . | Oligonucleotide Sequences (5′–3′) . |
---|---|
IgL promoter | |
PLF | GACTCTCGAGTGGGAAATACTGGTGATAGG |
PLRH | GACTAAGCTTGGCGGAATCCCAGCAGCTGTGTGTC |
Enh (467-bp enhancer) | |
CHEFB1 | GACTAGATCTCAGCTGGGGCCACACAAAGAG |
CHERB1 | GACTGGATCCCTGGAAGCAGGCAGGAGTCGTG |
Fragment 1 | |
KR1F | GTCAGGATCCCCCGGCAAGTGGCGGCTGCT |
KR1R | GTCAAGATCTACCCACAGCTGGCCGTGGCATC |
Fragment 2 | |
KR2F | GTCAGGATCCTCCGCCACAACCGCTCCGCA |
KR2R | GTCAAGATCTAGCGTCCTGCTGGACAGCAGGC |
Fragment 3 | |
KR3F | GTCAGGATCCGCCCACTCTCATTGCGGTGCT |
KR3R | GTCAAGATCTAAATGTACGCAGCCCAGGAG |
Fragment 4 | |
KR4F | GTCAGGATCCCACGGCACAAAGGTGTTTAT |
KR4R | GTCAAGATCTATGAGGATCCGCTTTGCTAATGAGCAGAAG |
Fragment 5 | |
KR5F | GTCAGGATCCGTGCCCTGGTGCTCTGCAAT |
KR5R | GTCAAGATCTTCCAGTCTGCAGCGTGTGCAT |
Fragment 6 | |
KR6F | GTCAGGATCCTGTGGTCACTGCTGGGCTCT |
KR6R | GTCAAGATCTAAGCAGAGCCAGGAGCAGGA |
Fragment 7 | |
KR7F | GTCAGGATCCTGGCTGCGGTCAGCACATCT |
KR7R | GTCAAGATCTACTGTGGGCAGCAGGCTGAA |
Fragment A | |
KR7.2F | GACTGGATCCGAGACACGGATGGAGCAGTGTG |
KR7R | GTCAAGATCTACTGTGGGCAGCAGGCTGAA |
Fragment B | |
KR4F | GTCAGGATCCCACGGCACAAAGGTGTTTAT |
KR7R | GTCAAGATCTACTGTGGGCAGCAGGCTGAA |
Fragment C | |
KR7.1F | GACTGGATCCGCTCCGAGGCCACAAGCCCT |
KR7R | GTCAAGATCTACTGTGGGCAGCAGGCTGAA |
Fragment D | |
KR7.1F | GACTGGATCCGCTCCGAGGCCACAAGCCCT |
KR7.2R | GACTAGATCTCACACTGCTCCATCCGTGTCTC |
Fragment E | |
KR7F | GTCAGGATCCTGGCTGCGGTCAGCACATCT |
KR3R | GTCAAGATCTAAATGTACGCAGCCCAGGAG |
Fragment F | |
KR7F | GTCAGGATCCTGGCTGCGGTCAGCACATCT |
KR7.1R | GACTAGATCTAGGGCTTGTGGCCTCGGAGC |
ZFTE fragment (from zebra finch IGL locus) | |
ZFTEF | GGATCCAAACTAATCAGTCCCTGCCTGC |
ZFTER1 | GGATCCTGTGGCTCTGCTGGGAATG |
Targeting vectors for deletion analysis | |
ΔME6.1 and ME6.1S (downstream homology arm) | |
Eraf53B | GACTGGATCCCAGCAATTCACAGAAACATTG |
Erar21X | GACTCTCGAGCAGGGCTGCAATAAAGGTGAG |
ΔME6.2 (downstream homology arm) | |
hypR3A | AATTGGATCCCAGGGAGCTCACCTTTATT |
hypF1A | GATACTCGAGTTGTATTTCCCATCCTGGTG |
Δ1.5K (upstream homology arm) | |
C9N | GACTGCGGCCGCTCAACAGATCAGCACTGGAGAC |
D3KR | GACTGGATCCGCGTGGTGGGAGCGGGCAGG |
Δ1.5K (downstream homology arm) | |
ERAF54B | GACTGGATCCCAGGCTCTGGTCCCATCTCACTG |
Erar21X | GACTCTCGAGCAGGGCTGCAATAAAGGTGAG |
Southern and Northern blot probes | |
CL probe (used for Northern blots to determine IGL transcript levels and Δ1.5K Southern blots) | |
CCLF1 | CCCACCGTCAAAGGAGGAGCTG |
CHVLR1 | CAGTAGATCTTTAGCACTCGGACCTCTTCAGG |
GAPDH probe | |
CHGAPDHF | ACCAGGGCTGCCGTCCTCTC |
CHGAPDHR | TTCTCCATGGTGGTGAAGAC |
ER probe (used for Southern blot genotyping of all clones except for Δ1.5K) | |
erar4 | AGCACAGAACAGGCACGTGCT |
QG11R | GACGTTGATGTGGACGATGTG |
Plasmids
To generate the enhancerless pIgLP luciferase reporter construct, a 425-bp IgL promoter fragment was amplified from DT40 genomic DNA using the oligonucleotide pair PLFX/PLRH and cloned as an XhoI/HindIII fragment into pGL3-Basic (Promega) digested with the same enzymes. To create all other reporter plasmids, the fragments enhancer (Enh), 1–7, A, B, C, D, E, F, and zebra finch targeting element (ZFTE) were amplified from DT40 genomic DNA (or zebra finch genomic DNA, in the case of ZFTE) using the primer pairs listed (Table I), cloned into the pCR4-TOPO vector (Invitrogen), sequenced, and subsequently inserted as BamHI/BglII fragments into the BamHI site of pIgLP. As we were unable to amplify fragment 3′RR+ directly from genomic DNA, it was assembled from three subfragments using the internal unique NdeI and BsmBI sites.
The gene-targeting plasmids for ΔME6.2/ΔME6.1/ΔME6.1S were generated using the targeting vector for the DT40 ΔME cell line as a backbone, which contains the intronless VJ-C exon in its left arm (15). The right arm was replaced with the respective PCR fragments that were cloned as BamHI/XhoI fragments. The loxP-flanked puromycin selection cassette was cloned as a BamHI (from pLoxPuro) (18) or a BamHI/BglII fragment (from pBSK BBH puro) into the unique BamHI site located between the left and the right arm. Where required, the SV40 enhancer was inserted as a BamHI/BglII fragment in the unique BamHI site next to the puromycin selection cassette.
The gene-targeting plasmid for the Δ1.5K line was generated by cloning respective PCR products as NotI/BamHI and BamHI/XhoI fragments into pBluescript and inserting the puromycin selection cassette as described above. For the knockin constructs, the enhancer fragments B, C, and ZFTE were inserted as BamHI/BglII fragments into the unique BamHI site next to the puromycin cassette in the ΔME6.1 targeting vector.
Southern blots
Ten micrograms genomic DNA was digested with respective restriction enzymes separated on 0.8% agarose gels in 1× TBE. Subsequently, the gel was treated with HCl to fragment high m.w. DNA, and the DNA was denatured and transferred to GeneScreen Plus (PerkinElmer) membranes using alkaline transfer buffer (0.4 M NaOH, 1 mM EDTA). Probes were amplified from genomic DNA using the primer pairs listed (Table I) and radiolabeled with [32P]–α-deoxy-CTP using the NEblot kit (New England Biolabs). Blots were hybridized at 62°C overnight, washed, and the band patterns were detected using phosphorimaging screens and a Storm 860 phosphorimager (GE Healthcare Life Sciences).
Northern blots
Total RNA was isolated using RNA-Bee (Tel-Test) according to the manufacturer’s instructions. Ten micrograms RNA was loaded on 1% agarose gels with formaldehyde and separated in 1× MOPS at 80 V. RNA was transferred to GeneScreen Plus membranes in 10× SSC (150 mM sodium citrate and 1.5 M NaCl). Blots were hybridized (see 1Materials and Methods, 4Southern blots) with an IGL C region probe (amplified using primer pair CCLF1/CHVLR1) and a GAPDH probe (amplified using primer pair CHGAPDHF/CHGAPDHR). Blots were visualized using phosphorimaging and quantified using ImageQuant software (Molecular Dynamics).
Cell culture
DT40 were grown in RPMI 1640 medium (Mediatech) supplemented with 10% FBS (Invitrogen), 1% chicken serum (Sigma-Aldrich), 10 mM HEPES, 2 mM l-glutamine, and penicillin/streptomycin. Cell cultures were maintained at 41°C in 5% CO2.
Luciferase assays
All luciferase assays were performed using the Dual-Luciferase Assay Kit (Promega) according to the manufacturer’s protocol. Briefly, 1 × 106 cells were transfected with 1 μg luciferase reporter plasmids and 1 μg pRL-SV40 (Promega) using the Amaxa Nucleofactor T kit (Lonza) with program B023. After 48 h, cells were harvested and lysed, and luciferase activities were measured using a Monolight 2010 luminometer (BD Biosciences).
Gene targeting
The ΔM cell line was used as the parent for all clones generated in this study, except for the Δ1.5K line, for which wild-type cells are the parental line. Transfections were performed by electroporation (580 V, 25 μF, ∞Ω) with 25–30 μg linearized targeting plasmids. Stable integrants were selected using 0.5 μg/ml puromycin and the genotypes of individual clones determined by Southern blot analyses. Clones that carried a randomly integrated copy in addition to the targeted integration were discarded. The puromycin resistance cassettes were removed using recombinant cell-permeable Cre protein as described previously (15).
SHM/GCV analyses
The assays for GCV (by FACS) and SHM plus GCV (by DNA sequencing) were performed and analyzed as described previously (15). For the IGH locus, we consider any sequence that has more than one nucleotide change as a single event, as we do not have the sequence information of all pseudo-VH elements that would be required to assign distinct donor sequence to each observed nucleotide change. Hence the IGH frequencies represent the results of conservative lower-limit calculations.
Sequence alignments
Bioinformatic sequence analyses were performed using the Web-based tools Pipmaker (http://bio.cse.psu.edu/pipmaker) (19), MULAN, (http://mulan.dcode.org/), and multiTF (http://multitf.dcode.org/) (20).
GenBank submission
The sequences of the condor IGL locus fragments were deposited at GenBank (http://www.ncbi.nlm.nih.gov) under the accession numbers HQ414233 and JF693631.
Results
The chicken IGL locus
The rearranged chicken IGL locus consists of a single set of functional leader L, VJ, and C region exons (Fig. 1A). Upstream of these coding segments are 25 pseudo-V elements, and downstream of them a 467-bp minimal Enh element has been identified (Fig. 1A) (21). The 3′RR that contains a targeting element for SHM/GCV is located downstream of this enhancer (15), and the only readily identifiable sequence element in this area is a nonfunctional copy of a CR1 retrotransposon (22). This CR1 retrotransposon is not unique to the DT40 cell line, as it is also present in the published chicken genome assembly. However, the functional importance (if any) of this element for the IGL locus is unclear. Lastly, the 3′RR is also essential for transcription of the IGL locus (15), and hence our approach was to identify sequence elements of functional importance within this region.
The chicken DT40 cell line serves as our experimental system. Although it is a cell line model, it is commonly thought that the molecular mechanism of SHM and GCV in these cells, in particular with respect to targeting, is likely a good reflection of what occurs in primary B cells. Our experiments use the endogenous IGL promoter and IGL gene as a mutation readout instead of employing a more artificial enhanced GFP reporter system.
Transcription enhancers in the chicken IGL locus
To define the transcriptional enhancers in the 3′RR and its vicinity, luciferase assays were employed (Fig. 1), and Fragment 7, which contains the previously defined minimal enhancer (21), showed the highest luciferase activity, ∼2-fold higher than Enh alone (Fig. 1B). Further dissections defined fragment B (Fig. 1C) as the minimal DNA fragment providing the strongest enhancer activity in our transient luciferase assays. The increase in enhancer activity of fragment B over fragment 7 is likely a result of a deletion of negative regulatory elements in the 3′ end of fragment 7.
Separation of transcription and mutation enhancer function
To determine whether fragment B contained both transcription and mutation enhancer activity in the context of the endogenous locus, knockin clones were generated (for details, see Materials and Methods and Table I) in which fragment B was placed into the empty IGL locus of the DT40 ΔME6.1 line (FragB, Fig. 2A, Supplemental Fig. 1A). The ΔME6.1 cells showed neither transcription nor SGM/GCV (Fig. 3 and discussed below) and thus served as a platform to test the properties of respective cis-regulatory elements. Steady-state transcript levels of the IGL gene in FragB cells were comparable to those in the parental ΔM cell line (Fig. 2B). Thus, fragment B acts as a transcriptional enhancer in the context of the endogenous IGL locus.
To test whether this element is also sufficient to drive SHM/GCV, subclones of the FragB lines were continuously cultured for 4 wk starting from single-cell clones. The VJ exon in these clones contains a premature stop codon, and its reversion by GCV leads to the appearance of surface IGM+ cells. Although flow cytometry analyses of parental ΔM clones showed on average 3.72% IGM+ cells after 4 wk, such IGM+ populations did not arise in FragB lines (Fig. 2D). DNA sequencing confirmed this absence of SHM and GCV (Fig. 2C), with mutation frequencies being measured below the background of 5 × 10−5 events/bp observed in AID−/− DT40 cells (23). The IGH locus mutation frequency of 3.47 × 10−4 events/bp (Fig. 2C) was comparable to that of 4.79–6.22 × 10−4 events/bp routinely observed in DT40 cells (15), indicating that the lack of SHM/GCV in the IGL locus was caused solely by the manipulation of the IGL locus itself. Thus, we concluded that fragment B is sufficient to drive normal levels of IGL transcription, but is unable to support SHM/GCV. This provides definitive evidence for the concept that transcriptional enhancer function and mutational enhancer function are physically separable in the DT40 system.
To determine whether the addition of noncoding sequences could indeed increase mutation levels without altering transcription, we generated fragment C, which extended 222 bp beyond the 3′ end of fragment B (Fig. 2A). As predicted, IGL transcript levels in the corresponding FragC DT40 knockin line (Fig. 2A, 2B, Supplemental Fig. 1B) remained unaltered. This indicated that these additional 222 bp of DNA are dispensable for transcription. Importantly, however, AID-mediated sequence diversification was partially restored in these FragC clones. An IGM+ cell population became readily detectable in our sensitive flow cytometry-based IGM reversion assay (Fig. 2D) and reached levels of ∼50% of the parental ΔM clone that was used to generate this genotype by gene targeting. Although the DNA sequencing approach is less sensitive (as it is a PCR-based assay), the mutation event frequency measured in FragC lines was still clearly above the background of our system. In summary, this demonstrated that a small fragment from the 3′RR contains at least parts of an MEE critical for SHM/GCV targeting and that addition of this sequence to the transcriptional enhancer now recruits AID activity without altering transcription.
A pair of MEEs in the chicken IGL locus
To determine where within the 3′RR the MEE (i.e., the SHM/GCV targeting element) resides, we generated DT40 lines with smaller deletions of this region (Fig. 3, Supplemental Fig. 1C–E). Importantly, even deleting as little as 1.5 kb of the 3′RR in the ΔME6.1S line resulted in a complete disruption of SHM/GCV specifically in the IGL locus (Fig. 3). Thus, we concluded that an MEE for SHM/CSR resides in the 1.5 kb between the Enh enhancer and the CR1 retrotransposon. This location is fully consistent with our observations in the FragC line in which a small fragment from this 1.5-kb element clearly enhanced mutation (Fig. 2D).
To determine whether this MEE is also essential for AID-mediated sequence diversification in the context of the intact wild-type IGL locus, we generated the Δ1.5K DT40 line (Fig. 3, Supplemental Fig. 2A). Somewhat surprisingly, this genotype was still able to drive SHM/GCV in the IGL locus robustly at 70% of the level observed in wild-type DT40 cells [4.07 × 10−4 events/bp sequenced (15)]. This strongly suggested that a second targeting element exists upstream of the minimal Enh enhancer element.
The ΔM line lacking the VJ-C intron and the ΔE line lacking the Enh enhancer showed wild-type levels of SHM/GCV, but the deletion of 2.3-kb noncoding DNA between the C region exon and the enhancer in the ΔME line reduced SHM/GCV to 50% (15). As this is comparable to what we observed in our Δ1.5K line, we inferred that an additional MEE resides in that area. Hereafter, we will refer to the MEEs upstream and downstream of the enhancer as 5′MEE and 3′MEE, respectively. As the individual deletion of each element resulted in similarly modest effects on SHM/GCV, we inferred that these elements might be redundant in function. Alternatively, one MEE could drive SHM, whereas the other supports GCV. Importantly, however, the ΔME and Δ1.5K line lacking the 5′MEE or 3′MEE, respectively, show evidence for both SHM and GCV (Fig. 4). Thus, we conclude that these two elements are largely redundant.
Evolutionary conservation of targeting activities
To determine whether MEEs are a unique feature of chicken B cells, a search for such elements in another species was initiated. As standard sequence comparison algorithms were unable to detect significant similarities between the noncoding regions of the IGL locus of chicken and those of humans, mice, rats, lizards, and teleost fish, we decided to focus on a more closely related species, the zebra finch (Taeniopygia guttata). An 8-kb contig of the zebra finch IGL locus was assembled by a combination of in silico and PCR experiments and largely matched the current genome assembly (which became available while these studies were ongoing). A sequence comparison using Pipmaker revealed four areas of strong homology between this region and the chicken IGL locus (Fig. 5A). Interestingly, there is a stretch of strong homology upstream of the Enh enhancer in an area where we predicted the location of the 5′MEE (discussed below), but also to the 1.5-kb region of the 3′RR to which we mapped the chicken 3′MEE. Hereafter, we focus on the latter homologous sequence element and refer to it as the ZFTE.
To test whether the ZFTE is able to promote GCV and SHM in the context of the chicken locus, knockin cell lines were generated in which the chicken targeting element was replaced with this fragment (Fig. 5B, Supplemental Fig. 2B). The ZFTE was fully able to promote transcription of the chicken IGL gene (Fig. 5C), and evidence for ongoing sequence diversification by SHM/GCV was readily observed by flow cytometry (Fig. 5E). Furthermore, DNA sequencing also showed clear evidence for ongoing SHM/GCV (Fig. 5D), albeit at levels below those observed in the parental ΔM line (15). Interestingly, AID-mediated mutagenesis in the IGH locus of the ZFTE DT40 cells was similarly reduced in two independently generated lines (Fig. 5E), but the reason for this effect remains unknown. Overall, these observations strongly suggest that the MEEs for AID-mediated sequence diversification are evolutionarily conserved.
A common strategy to find conserved binding sites in cis-regulatory elements in rapidly evolving loci is to look at comparisons between closely related species. Thus, we isolated and sequenced bacterial artificial chromosome clones containing the IGL locus of the California condor (Gymnogyps californianus) and identified a region that showed striking homology to both the chicken and the zebra finch locus (a detailed description of the California condor IGL locus will be published elsewhere). Three-way alignments using MULAN revealed two stretches with strong sequence homology, one upstream and one downstream of the Enh enhancer (Fig. 6A). Strikingly, the downstream region partially overlaps with the small 222-bp sequence element that exhibited MEE activity in the FragC line (Fig. 6B), and hence we conclude that this area of homology might represent at least parts of the 3′MEE. In this region, the in silico approach to identify conserved transcription factor binding sites using multiTF predicted binding sites for the transcription factors E2F, FoxO forkhead proteins, and a GATA factor binding site (Fig. 6A). Furthermore, one stretch of nucleotides (AGXTTGTAAACAXGCTGA) stood out, as it showed almost perfect conservation across all three species. No match to this sequence was found, however, in any of the available databases scanned. Importantly, no evolutionarily conserved sites for any of the transcription factors that had been previously implicated in targeting, including E2A (24–26), NF-κB, Mef2, and Oct1/2 (17), were present in this region. Lastly, based on our current delineation of the border between the transcriptional enhancer and the 3′MEE, the GATA site is a strong candidate for conferring MEE function as it only is present in the IGL locus of the FragC lines, which showed SHM/GCV, but not in the FragB lines, which only shows transcription (Fig. 6B).
With respect to the upstream region of homology, it is conceivable that this area also corresponds to a cis-regulatory element, and hence it might include the 5′MEE of the chicken IGL locus (Fig. 6A). It is important to note, however, that a transcriptional silencer has been annotated in this region as well (Fig. 6A). Interestingly, this putative 5′MEE contains a conserved E-box motif (CAGCTG) that could function similar to the E-box motifs in the murine Igκ enhancers (24), but surprisingly, there is no overlap between the predicted transcription factor binding sites in the evolutionarily conserved putative 5′MEE and 3′MEE. In summary, our sequence alignment suggests that the sequences of the IGL MEEs are evolutionarily conserved and that their function might be mediated by highly conserved modules containing binding sites for known transcription factors and targeting factors of unknown identity.
Discussion
AID is a DNA mutator that poses an enormous threat to genomic integrity in B cells when its expression is turned on upon activation. A widely entertained working model proposes that cis-regulatory sequences might play a key role in restricting these processes to Ig genes (27). Our experimental data presented in this study now show that MEEs and the transcriptional enhancers in chicken IGL locus are physically separable (i.e., MEEs are critical elements beyond transcriptional enhancers to recruit AID-mediated mutagenesis). Currently, we cannot rule out that MEEs also influence transcription. Overall, our findings require a drastic revision of the long-standing model that Ig enhancer elements (defined and identified by narrowing on sequences elements with strong classical transcription enhancing features) also harbor the cis-regulatory sequences that target SHM, GCV, and CSR to these gene loci (13). The observation of AID-dependent mutation events in non-Ig genes (e.g., in BCL6) suggests that MEEs are also resent outside Ig loci and might be more widely distributed than previously anticipated (6, 8). It is currently unknown whether such non-Ig MEEs are evolutionarily conserved (i.e., whether for example the BCL6 gene of all jawed vertebrates is mutated by AID), but it seems rather unlikely as they might be subject to negative selective pressure. MEEs might also be the defining features of genomic loci with increased instability, like breakpoint cluster regions and recombination hotspots, and are directing translocations that ultimately lead to lymphoma formation.
One of the key questions with respect to targeting of AID-dependent sequence diversification is the identity of minimal MEEs (i.e., the minimal sequence elements that are necessary and sufficient to target SHM, GCV, and CSR to Ig loci). Our analyses revealed the presence of two distinct and largely redundant MEEs in the chicken IGL locus, the 5′MEE and 3′MEE, upstream and downstream of the previously described minimal transcriptional enhancer. These data are consistent with some of the published data obtained in the context of a largely modified IGL locus including a GFP reporter gene (16). Furthermore, downstream of the full transcriptional enhancer defined in this study, we identified a small 222-bp fragment that likely represents a central component of the 3′MEE. Ongoing systematic deletion studies in this region of the IGL locus will help in fine-mapping the boundaries and location of this element.
Using an evolutionary approach, we showed that a 1.7-kb fragment from the zebra finch IGL locus is functional as a SHM/GCV targeting element when placed in the chicken cells. Chicken belongs to the Galloanserae, whereas the zebra finch and California condor are members of the Neoaves, and the last common ancestor of these birds is thought to date back to 105 million y ago (28). Three-way alignments between Ig gene loci sequences of these three bird species focused our attention to smaller highly conserved regions upstream and downstream of the minimal Enh enhancer. The relevance of these regions for IGL biology was strongly supported by the fact that the downstream homology area overlaps with the location of the 3′MEE determined in our functional studies. Importantly, the putative 5′MEE and 3′MEE both harbor short sequence modules with nearly complete evolutionary conservation containing putative binding sites for CDP, CEBPB, E2A, HSF1, E2F, FoxO, and GATA transcription factors, whereas other modules remain orphans (Fig. 6). It is tempting to speculate that subsets of these (individually or in combination) act as binding sites for trans-acting factors mediating targeting. Interestingly, although the 5′MEE and the 3′MEE are largely redundant, there is no overlap between the predicted transcription factor binding sites for the upstream and downstream MEE. As the corresponding transcription factors are rather broadly used in lymphocytes and not unique to activated B cells, we predict that the as-of-yet uncharacterized conserved sequences may hold the key to understanding the mechanism of SHM/GCV targeting.
Sequence comparisons between the bird MEEs and mammalian Ig loci do not reveal any regions of high similarity, but we predict that MEEs exist not only in mammalian Ig genes but in the Ig loci of all jawed vertebrates. As SHM is conserved throughout the jawed vertebrate lineage, we favor a model in which the targeting factors recruited by these cis-regulatory sequences are evolutionarily conserved as well. Hence the identification of the minimal set of target factor binding sites required for MEE function will facilitate bioinformatics approaches to determine the location of clusters of such sites (and hence the MEEs) in all Ig loci. Such an approach would also represent a tool toward identifying MEEs in non-Ig genes like BCL6. In an alternative model, functionally conserved factors with altered DNA sequence specificity would mediate targeting and comparing the Ig loci of multiple phylogenetically closely related species (as done for birds in this paper) would guide to conserved cis-regulatory sequences including MEEs. A comparison of the Igκ locus of mouse (Mus musculus) and rat (Rattus norvegicus) does reveal a large number of conserved noncoding sequence segments, but the lack of genomic sequence from a third rodent currently precludes a more stringent approach toward identifying candidates for MEEs in their Ig loci.
Lastly, the mode of action of MEEs at the molecular level remains elusive. As AID is more widely distributed over the genome than previously thought (6, 8, 29), the most simplistic model, the specific recruitment of the AID enzyme to Ig gene loci, is likely incorrect. Thus, Ig locus specific activation of AID and/or recruitment of error-prone polymerases represent plausible mechanisms of MEE function. This would be consistent with a model that MEEs are also present and active in non-Ig loci even outside the B cell lineage, where AID is unlikely to be the initiating factor for genomic instability. The recent description of DNA zip codes, small DNA sequences that drive the localization of repressed genes to the nuclear pore in budding yeast (30), raises the possibility that MEEs might act in a similar fashion to localize Ig loci to distinct subnuclear regions in which error-prone DNA repair occurs.
Acknowledgements
We thank Drs. David R. Wilson, Ranjan Sen, and Shu Yuan Yang for helpful discussions, suggestions, and comments on this manuscript. We also thank Dr. Arthur P. Arnold for providing genomic DNA from a zebra finch and Dr. Oliver A. Ryder for the California condor genomic DNA sample.
Footnotes
This work was entirely supported by the Intramural Research Program of the National Institutes of Health, National Institute on Aging.
The online version of this article contains supplemental material.
References
Disclosures
The authors have no financial conflicts of interest.