Abstract
The autoimmune regulator is a critical transcription factor for generating central tolerance in the thymus. Recent studies have revealed how the autoimmune regulator targets many otherwise tissue-restricted Ag genes to enable negative selection of autoreactive T cells.
The autoimmune regulator (AIRE) gene was identified by positional cloning of the genetic locus linked to a rare autoimmune disease, autoimmune-polyendocrinopathy-candidiasis-ectodermal dystrophy (APECED) (1, 2). The encoded AIRE protein is expressed primarily in medullary thymic epithelial cells (mTECs) (3). AIRE is also expressed in peripheral lymphoid tissues (4), where its contribution to tolerance remains to be investigated. The mouse model of the human APECED disease, the Aire−/− mouse, was instrumental in identifying the cellular role of AIRE, which is to regulate promiscuous expression of tissue-restricted Ag (TRA) genes in mTECs (5). The key finding was that mTECs from Aire−/− versus wild-type mice expressed fewer TRAs. Although this promiscuous gene expression in mTECs had been recognized previously (6), the underlying mechanism remained elusive. To date, AIRE is the only identified transcription factor that regulates this process. Among others, AIRE activates insulin, interphotoreceptor retinoid-binding protein A, and mucin 6 genes, all of which had been linked to autoimmunity in humans or AIRE-deficient mice (7, 8). Expressed TRAs are then processed and presented on the surface of mTECs or taken up by dendritic cells (9, 10). Exposure of maturing T cells to these Ags is critical for negative selection of T cells in the thymus (11, 12). In the absence of AIRE, autoreactive T cells mature and escape into the periphery, which can lead to autoimmunity (7, 8, 13).
Transcription of protein-coding genes occurs in several phases, all of which are regulated (14). First, transcription factors and RNA polymerase II (RNAPII) are recruited to promoters. Although RNAPII initiates transcription, it then pauses because of the action of negative elongation factor (NELF) and 6-dichloro-1-α-d-ribofuranosylbenzimidazole sensitivity inducing factor (DSIF). The release of RNAPII to elongation is mediated by the positive transcription elongation factor b (P-TEFb), composed of a regulatory cyclin subunit (CycT1 or CycT2) and the cyclin-dependent kinase 9 (15). P-TEFb phosphorylates subunits of DSIF and NELF as well as serine residues at position 2 (Ser2) in the C-terminal domain (CTD) repeats of the largest RNAPII subunit (RPB1) (16). DSIF is thus changed to an elongation factor, NELF is released from the nascent RNA, and the phosphorylated RNAPII can now elongate (16). The CTD of human RNAPII contains 52 heptapeptide repeats (YSPTSPS). The serines, threonines, and tyrosines in these repeats are phosphorylated by P-TEFb and other kinases in distinct phases of transcription, giving rise to diverse set of instructions, which are known as the CTD code (17). The phosphorylated elongating RNAPII directs cotranscriptional processing, that is, splicing and polyadenylation of genes (18). Transcription elongation proceeds until termination, upon which RNAPII becomes dephosphorylated, and the cycle begins anew. Recent genome-wide analyses revealed an unexpectedly high abundance of paused RNAPII at most promoters, including those of inactive genes (19–22).
AIRE is a transcription factor that assembles into oligomers and forms punctate structures that colocalize with CREB-binding protein, P-TEFb, and small nuclear ribonucleoproteins in the nucleus (23–25). The estimated number of AIRE-regulated genes ranges from several hundred to thousands. They have diverse promoters and are regulated by distinct transcription factors in their corresponding tissues. These findings raise the conundrum of how AIRE can regulate such a large repertoire of divergent genes. Recent work from others and us has addressed this question and revealed the mechanisms of action of this enigmatic protein (26–31).
The AIRE protein has a predicted molecular mass of 58 kDa (Fig. 1A) (3). It forms large oligomers, which are detected in a >670-kDa fraction by gel filtration (32). The N terminus of AIRE contains the homogeneously staining region (HSR) and the Sp100, AIRE-1, NucP41/75, and DEAF-1 (SAND) domain (Fig. 1A). Mutations in HSR from APECED patients disrupt multimerization, which is essential for AIRE function (32, 33). The AIRE SAND domain is also involved in this oligomer formation. It lacks the canonical KDWK motif, which is required for DNA binding of other SAND domains (34). Although DNA binding of the AIRE SAND domain had been reported in vitro (35, 36), it appears to be nonspecific (37) and may be irrelevant for its recruitment to target genes in vivo. Indeed, we found that mutating KNKA residues in the AIRE SAND domain, which correspond to the KDWK motif in other members of this family, had no effect on AIRE-induced expression of a plasmid reporter gene (26). In some APECED patients an autosomal-dominant G228W mutation was found in the SAND domain (Fig. 1A), which causes the wild-type AIRE protein to coaccumulate in larger structures that do not colocalize with sites of active transcription (25). Thus, dominant-negative effects of this G228W mutation could be due to an increased affinity for the wild-type AIRE protein. The N terminus of AIRE is required for nuclear localization (38). The nuclear localization signal was mapped to basic residues from positions 131 to 133 (Fig. 1A) (39).
AIRE protein domains, key interacting partners, and mechanism of TRA gene expression. (A) AIRE contains several domains that are related to those in other transcription factors. From the N terminus they are: HSR (green), which is important for the oligomerization of AIRE and may function as a caspase recruitment domain (58); SAND (green); PHD1 and PHD2 (violet); proline-rich region (PRR; orange); and the TAD (red). Protein residues corresponding to human AIRE are labeled above, and the position of the four LXXLL motifs is marked below the diagram. The location of the nuclear localization signal is also indicated (KRK). Mutations discussed in this review are marked with an asterisk below the diagram with accompanying labels. Arrows depict interactions between DNA-PK, H3K4, P-TEFb, and AIRE. (B) Schematic diagram depicting the molecular mechanism of AIRE-regulated TRA gene expression. Combinatorial interactions between AIRE, unmodified H3K4 (yellow), DNA-PK (red), and RNAPII (blue) recruit AIRE to a TRA promoter. AIRE brings P-TEFb (red) to phosphorylate the RNAPII CTD, and this results in transcription elongation, mRNA processing, and TRA gene expression. Histone H2AX with phosphorylated Ser139 (γH2AX), which marks DNA double-strand breaks, is depicted in yellow. (C) The Venn diagram marks sets of genes that contain unmodified H3K4, engaged RNAPII, and DNA-PK at their promoters. AIRE is targeted to and can regulate expression of genes at the intersection of these sets.
AIRE protein domains, key interacting partners, and mechanism of TRA gene expression. (A) AIRE contains several domains that are related to those in other transcription factors. From the N terminus they are: HSR (green), which is important for the oligomerization of AIRE and may function as a caspase recruitment domain (58); SAND (green); PHD1 and PHD2 (violet); proline-rich region (PRR; orange); and the TAD (red). Protein residues corresponding to human AIRE are labeled above, and the position of the four LXXLL motifs is marked below the diagram. The location of the nuclear localization signal is also indicated (KRK). Mutations discussed in this review are marked with an asterisk below the diagram with accompanying labels. Arrows depict interactions between DNA-PK, H3K4, P-TEFb, and AIRE. (B) Schematic diagram depicting the molecular mechanism of AIRE-regulated TRA gene expression. Combinatorial interactions between AIRE, unmodified H3K4 (yellow), DNA-PK (red), and RNAPII (blue) recruit AIRE to a TRA promoter. AIRE brings P-TEFb (red) to phosphorylate the RNAPII CTD, and this results in transcription elongation, mRNA processing, and TRA gene expression. Histone H2AX with phosphorylated Ser139 (γH2AX), which marks DNA double-strand breaks, is depicted in yellow. (C) The Venn diagram marks sets of genes that contain unmodified H3K4, engaged RNAPII, and DNA-PK at their promoters. AIRE is targeted to and can regulate expression of genes at the intersection of these sets.
AIRE also contains two plant homeodomains, plant homology domain (PHD)1 and PHD2 (Fig. 1A). PHDs are zinc fingers closely related to RING domains, which are common in proteins involved in ubiquitylation (40). PHDs are protein–protein interaction modules, which can mediate binding to nucleosomes (41, 42). Such PHDs bind to N-terminal tails of histone H3, discriminating between those methylated (H3K4me3) or unmodified (H3K4) at Lys4. AIRE PHD1 is most closely related to the BHC80 PHD that interacts selectively with the unmodified H3K4 (42). Indeed, AIRE PHD1 also binds to the unmodified H3K4 (Fig. 1A), and this interaction is required for AIRE to activate transcription of genes in chromatin (26, 30, 37). The AIRE PHD2 sequence is divergent and does not interact with nucleosomes, but it contributes structurally to the activation of TRA genes by AIRE (27, 31, 43).
AIRE PHD1 also binds to the DNA-dependent protein kinase (DNA-PK) (Fig. 1A) (44). DNA-PK is a nuclear kinase, which not only functions to repair DNA double-strand breaks and mediate V(D)J recombination, but it also supports transcription and chromatin remodeling (45). DNA-PK also phosphorylates two sites in the N terminus of AIRE (44). However, pharmacological inhibition of DNA-PK and bringing AIRE to a promoter via heterologous DNA tethering in cells lacking DNA-PK revealed that the kinase activity of DNA-PK is dispensable for gene activation by AIRE. Instead, these interactions represent a key mechanism for the recruitment of AIRE to its target genes (26). An important aspect of this targeting is that AIRE interacts with DNA-PK, which is associated with the histone variant H2AX phosphorylated at Ser139 (γH2AX) that marks DNA double-strand breaks (26).
The C terminus of AIRE does not share obvious homology with functional domains in other proteins, but it is highly conserved between human and mouse AIRE proteins. It serves as a transcriptional activation domain (TAD) (Fig. 1A) (27). It binds to P-TEFb and brings it to TRA genes (23, 27). This leads to the phosphorylation of Ser2 in the RNAPII CTD and productive elongation with cotranscriptional processing of nascent mRNA species. The key role of this domain is underscored by an APECED patient mutation that affects only the extreme C terminus in AIRE (Fig. 1A) (46) but completely abolishes its function (27).
AIRE contains four LXXLL motifs, two in the HSR, one in a proline-rich region between the two PHDs, and one in the TAD (Fig. 1A). LXXLL motifs are known to mediate protein–protein interactions between nuclear receptors and their coactivators (47). Their role for AIRE remains to be established.
Armed with all of this knowledge of interacting proteins and chromatin targeting strategies of AIRE, we can now propose a model for the AIRE-regulated promiscuous TRA gene expression in mTECs. The defining feature of AIRE target genes is that although they are inactive, they have already engaged RNAPII on their promoters (Fig. 1B) (23, 27, 29). This means that the basal transcriptional machinery is already in place, but RNAPII is stalled or generates only sterile and unprocessed transcripts, which are unstable (48, 49). Because this RNAPII is not phosphorylated at its CTD, it also fails to recruit chromatin-modifying machineries. Thus, H3K4 remains unmodified, leading to PHD1-mediated recruitment of AIRE to such inactive genes (Fig. 1B) (30, 37). Initiation of transcription has been associated with topoisomerase II–catalyzed formation of transient double-strand DNA breaks, leading to DNA-PK recruitment to initiating RNAPII (50, 51). DNA-PK–associated AIRE is thereby brought into close proximity of the initiating RNAPII (Fig. 1B). Because the binding between AIRE PHD1 and unmodified H3K4 is insufficient to target AIRE to specific TRA genes (52), only the combinatorial interactions of AIRE with all its partners, the unmodified H3K4, DNA-PK, and engaged RNAPII, lead to the recruitment of AIRE with associated P-TEFb to these transcription units (Fig. 1C). Levels and distribution of these factors at distinct TRA genes will vary from cell to cell, thereby giving rise to the seemingly stochastic nature of TRA gene expression by AIRE.
However, several other aspects of AIRE remain to be investigated. For example, how is its transcription regulated in mature mTECs or in peripheral lymphoid tissues (53, 54)? There exist three splice variants of human AIRE, some of which lack the N-terminal domains required for the formation of oligomers (2). Do they play any role in central tolerance? Furthermore, several posttranslational modifications of AIRE have been reported (44, 55, 56). How do they affect the function of AIRE and who directs these transcriptional and posttranslational events to direct central tolerance? Finally, P-TEFb, the essential coactivator of AIRE, is itself tightly regulated in cells. Among other stimuli, P-TEFb can also be activated by cellular stress (57). Thus, it is likely that increased numbers of DNA breaks in mTECs also activate P-TEFb (28), thereby further enhancing the expression of TRA genes to optimize the establishment of central tolerance.
Conclusions
AIRE is a transcription factor that activates the expression of TRA genes in mTECs. Their promoters must be occupied by RNAPII, unmodified H3K4, and DNA-PK. Sufficient levels of these proteins ensure that AIRE is recruited to these sites. AIRE oligomers then bring P-TEFb to RNAPII, which leads to its extensive phosphorylation. Thus modified, RNAPII is competent for elongation and cotranscriptional processing of target genes, which leads to the expression of TRAs and their presentation to T cells via MHC class II determinants.
Footnotes
Abbreviations used in this article:
- AIRE
autoimmune regulator
- APECED
autoimmune-polyendocrinopathy-candidiasis-ectodermal dystrophy
- CTD
C-terminal domain
- DNA-PK
DNA-dependent protein kinase
- DSIF
6-dichloro-1-α-d-ribofuranosylbenzimidazole sensitivity inducing factor
- H3K4
histone H3 unmodified at Lys4
- HSR
homogeneously staining region
- NELF
negative elongation factor
- PHD
plant homology domain
- P-TEFb
positive transcription elongation factor b
- RNAPII
RNA polymerase II
- SAND
Sp100, AIRE-1, NucP41/75, and DEAF-1
- TAD
transcriptional activation domain
- TRA
tissue-restricted Ag.
References
Disclosures
The authors have no financial conflicts of interest.