Thymic regulatory T cells (tTreg) are critical in the maintenance of normal T cell immunity and tolerance. The role of TCR in tTreg selection remains incompletely understood. In this study, we assessed TCRα and TCRβ sequences of mouse tTreg and thymic conventional CD4+ T cells (Tconv) by high-throughput sequencing. We identified αβ TCR sequences that were unique to either tTreg or Tconv and found that these were distinct as recognized by machine learning algorithm and by preferentially used amino acid trimers in αβ CDR3 of tTreg. In addition, a proportion of αβ TCR sequences expressed by tTreg were also found in Tconv, and machine learning classified the great majority of these shared αβ TCR sequences as characteristic of Tconv and not tTreg. These findings identify two populations of tTreg, one in which the regulatory T cell fate is associated with unique properties of the TCR and another with TCR properties characteristic of Tconv for which tTreg fate is determined by factors beyond TCR sequence.

The ability to generate a rapid and sustained T cell response to external pathogens and transformed malignant cells is essential for the protection of the host. At the same time, deletion of autoreactive T cells during development and repression of excessive or autoreactive responses in peripheral tissues is essential to proper T cell protective function (1). This duality in the regulation of T cell function is accomplished by two types of T cells: conventional T cells that provide “helper” (CD4+) and “killer” (CD8+) functions and regulatory T cells (Treg) that suppress conventional T cell–dependent responses. Treg have been assigned to two subsets based on the origin of their generation: thymic Treg (tTreg) (or natural Treg) that develop in the thymus (25) and peripheral Treg generated in the periphery from thymic conventional CD4+ T cells (Tconv) under specific conditions (68). The development of tTreg appears to require both TCR signals and other factors, such as costimulatory signaling and cytokines, but the precise mechanisms of tTreg generation have not been fully elucidated.

A key factor in tTreg generation is the specificity of the TCR whose interaction with self-antigen/MHC plays a critical role in tTreg differentiation. Several studies using transgenic (Tg) mouse models suggest that the signal strength of TCR recognition of self-antigen/MHC ligand differs in tTreg and Tconv, with tTreg differentiation involving a higher level of signal strength (9, 10). Indeed, the disruption of normal self-antigen/MHC ligand expression in the thymus because of Aire deficiency causes a change in the fate of self-antigen–specific T cells from tTreg to Tconv (11, 12). Subsequent studies suggest that Ag presentation by different APCs (classical and plasmacytoid dendritic cells, cortical and medullary thymic epithelial cells, and B cells) at different thymic locations (cortex and medulla) influences the deletion of autoreactive thymocytes and the differentiation of tTreg (3, 13). In addition, factors such as cytokines (including IL-2 and TGF-β) (1416) and costimulatory receptors (CD28) (17) have also been implicated in the development of tTreg. Collectively, it is clear that no single factor alone determines differentiation to the tTreg fate, but precisely how these factors act in combination remains to be determined.

Initial analyses of TCR sequences in Treg and Tconv of TCRα or TCRβ Tg mice reported that CDR3α and CDR3β sequence repertoires of Treg and Tconv are different either by exclusive appearance in only one of these lineages or by their relative abundance in Treg or Tconv when sequences were found in both cell types (18, 19). Subsequent studies using high-throughput sequencing generated larger numbers of TCR sequences in Treg and Tconv. Studies using TCRβ Tg mice to compare TCRα sequences between tTreg and Tconv reported little overlap of TCRα sequence between tTreg and effector T cells (20) or between tTreg and Tconv that recognize the same foreign Ag (21). Study of a TCRα Tg mouse to compare TCRβ sequences between Treg and Tconv from spleen and peripheral lymph nodes found that 12% of TCRβ sequences are shared by peripheral Treg and Tconv and are thus presumed to be derived from common progenitors (22). However, there has been no reported deep sequencing analysis examining endogenous TCRα and TCRβ of tTreg and Tconv from the thymus of non-TCR Tg mice. It is therefore unclear what degree of TCR sequence uniqueness and similarity exists overall between tTreg and Tconv or, importantly, whether there are general sequence features that distinguish αβ TCR of tTreg from those of Tconv.

In this study, we addressed the role of TCR sequence in determining whether T cells develop into tTreg or Tconv lineages. We report a comprehensive comparison of TCRα and TCRβ sequences of tTreg and Tconv using a Unique Molecular Identifier (UMI) methodology incorporating a 5′ single universal primer for PCR amplification of all V genes, significantly reducing PCR bias of amplification and sequencing errors affecting the quantitation of TCR frequency (23, 24). Comparison of TCRα and TCRβ sequences between tTreg and Tconv from two normal mouse strains revealed that, although many sequences were unique to either Treg or Tconv, a substantial proportion of TCRα (21–30%) and TCRβ (5–20%) sequences from tTreg were also found in Tconv. Analysis of a TCRβ Tg mouse line revealed an even higher proportion (71%) of TCRα sequences found in tTreg that were also found in Tconv. Interestingly, these shared TCRα clonotypes that were common to tTreg and Tconv were significantly more abundant than nonshared TCRα sequences of tTreg and Tconv. Finally, we used machine learning (ML) to develop an algorithm that was capable of distinguishing nonshared TCRα and TCRβ sequences expressed by tTreg from those of Tconv and, in addition, found that specific amino acid trimers were differentially expressed in either tTreg or Tconv. When we applied the same ML algorithm to an analysis of those TCR sequences that were shared by tTreg and Tconv, the vast majority of these sequences were classified as characteristic of Tconv and not tTreg. Taken together, our findings identify TCR sequence characteristics that bias to tTreg or Tconv fate, in addition to the presence of factors that can drive cells with an identical TCR sequence into either Tconv or tTreg lineages.

tTreg and Tconv were isolated from 4- to 8-wk-old mice of three strains, all on a C57BL/6 background: 1) Rag-GFP-Foxp3-RFP (25, 26): tTreg and Tconv were isolated from three individual mice based on GFP and Foxp3-RFP expression (Supplemental Fig. 1) and were sequenced independently; 2) TcrdCreERZsGreen-Foxp3-RFP: TcrdCreER knock-in with tamoxifen-induced ZsGreen reporter for TCRδ expression (three doses of 1 mg tamoxifen i.p. every other day, cells isolated 2 wk after last injection) (27): tTreg and Tconv cells were isolated from nine mice based on expressions of ZsGreen, Foxp3-RFP, and CD25 (Supplemental Fig. 1), and sequencing was performed on three pools (three mice pooled in one sample); and 3) Foxp3-GFP TCRα+/− TCRβ-Tg mice carrying the DO11.10 TCRβ (28, 29): tTreg and Tconv samples from three individual mice were sorted based on Foxp3 reporter + and −, respectively, and individual samples were sequenced independently. Foxp3-GFP mice were provided by Vijay Kuchroo (30). All mice were maintained under specific pathogen free conditions at the animal facility of National Cancer Institute and Duke University. Animal procedures were reviewed and approved by National Institutes of Health or Duke Institutional Animal Care and Use Committee.

cDNA library construction was previously described (24). Briefly, total RNA was isolated from tTreg and Tconv using a QIAGEN RNeasy Micro Kit. Isolated total RNA (50–500 ng) was used for cDNA synthesis using TCRα and TCRβ C region–specific primers mTRAC1 (5′-GGCGTTGGTCTCTTTGAAG-3′) and mTRBC1 (5′-CACTTGTCCTCCTCTGAAAG-3′) (all oligos made by Eurofins USA), SMARTScribe Reverse Transcriptase (Takara Bio), and SmartN oligos (5′-AAGCAGUGGTAUCAACGCAGAGUNNNNUNNNNUNNNNUCTTrGrGrGrGp-3′) for template switching at the 5′ end to incorporate a UMI and M1SS sequence for PCR. The cDNA products were treated with uracil-DNA glycosylase (New England BioLabs) at 37°C for 30 min to remove SmartN oligos.

Three rounds of PCR using Super Fidelity Platinum Taq DNA Polymerase (Thermo Fisher Scientific) were performed to prepare libraries for sequencing. The first PCR (18–24 cycles) was used to enrich TCRs using M1SS (5′-AAGCAGTGGTATCAACGCA-3′, part of SmartN used as a 5′ PCR primer) and TCR C region primers (mTRAC2: 5′-CGGCACATTGATTTGGGAG-3′ and mTRBC2: 5′-TGTGGACCTCCTTGCCATTC-3′), and primers were removed by the QIAquick PCR Purification Kit (QIAGEN). The second PCR (20–32 cycles) was used to add an 8-bp sample barcode to each sample at 5′ end (P7M1S-n: 5′-CGTGTGCTCTTCCGATC(N)1–2-NNNNNNNN(8 bp barcode)-CAGTGGTATCAACGCAGAG-3′) and internal C region primers (mTRAC3: 5′-AGGTTCTGGGTTCTGGATG-3′ and mTRBC3: 5′-GGTGGAGTCACATTTCTCAG-3′). PCR products were separated by 2% agarose (UltraPure; Thermo Fisher Scientific) gel electrophoresis, and DNA fragments (400–800 bp) were further purified by a QIAquick Gel Extraction Kit (QIAGEN). Purified DNAs of each sample were quantitated by a BioAnalyzer (Agilent Technologies) and combined for the third round of PCR (10 cycles), which incorporates the Illumina adaptor (P7: 5′-CAAGCAGAAGACGGCATACGAGATGTGACTGGAGTTCAGACGTGTGCTCTTCCGATC-3′, mP5TA: 5′-AATGATACGGCGACCACCGATCGTCGAGGTTCTGGGTTCTGGATG-3′ and mP5TB: 5′-AATGATACGGCGACCACCGATCGTCGGGTGGAGTCACATTTCTC-3′). Amplified DNA was separated by 2% agarose gel electrophoresis and further purified by QIAquick Gel Extraction and followed by the PCR purification kits. The amount of purified DNA was measured using a Qubit Fluorometer (Thermo Fisher Scientific). Fifty picomoles of DNA were used for sequencing on an Illumina HiSeq 2500 system. A modified paired end sequencing protocol was used: TCR-specific sequencing primers TRA (5′-TCGTCGAGGTTCTGGGTTCTGGATG-3′) and TRB (5′-TCGTCGGGTGGAGTCACATTTCTCAG-3′) were used for first round sequencing of 150 bps. Illumina RD2 primer (5′-GTGACTGGAGTTCAGACGTGTGCTCTTCCGATC-3′) was used for second round sequencing of 50 bp, covering the sample barcode and UMI.

The raw sequences were first separated by the sample barcode using a custom Python script. The sequences were then filtered through a custom Python script to separate erroneous sequences under each UMI cluster. Sequences that had a successfully identified sample barcode and were successfully filtered were then processed via the Molecular Identifier Groups-Based Error Correction program (23), which assembles consensus sequences for each UMI cluster and identifies the V, J, and CDR3 sequences via BLAST. Post–Molecular Identifier Groups-Based Error Correction analysis included the removal of contaminated TCR sequences using a custom Python script. A successful TCR had to meet the following criteria: each UMI ≥3 sequence reads, a known V and J gene, and an intact and functional CDR3 amino acid sequence. TCR clonotype was defined as a unique combination of V, J, and CDR3 amino acid sequences. CDR3 length was based on the definition of the International Immunogenetics Information System (31). V-J gene usage, CDR3 length distribution, the percentage of public TCRs, and the overlap of TCR clonotypes between different mice and subsets were calculated using custom Python scripts. For V and J usage analysis, the percentages were calculated based on total distinct TCR sequences and UMI counts. Calculation of overlapping TCR clonotypes relied on the definition of the TCR as V–CDR3–J sequence. We used the DivE (32) R program to estimate the species richness, and the geometric mean of the top models were presented.

Overlap of TCRα and TCRβ sequences between different mice/samples of the same strain of mice or between two strains of mice was analyzed at two levels: 1) overlapping distinct TCR sequences (Eq. 1) 2) and overlapping total sequences based on UMI counts (Eq. 2). The overlapping sequences are presented as percentages for each pair of comparison.

Eq. 1 was used for the calculation of the overlapping percentage of distinct TCR sequences, as follows:

(1)

S1 and S2 refer to two individual samples of the same strain of mice or two different strains of mice.

Eq. 2 was used for the calculation of the overlapping percentage of total TCR sequences, as follows:

(2)

The sum of UMI counts of TCRs found in both S1 and S2 is calculated as UMI counts of TCRs found in both S1 and S2 in S1 plus UMI counts of TCRs found in both S1 and S2 in S2.

To build an ML classifier to analyze αβ TCR sequences from tTreg and Tconv, we converted the TCR CDR3 sequences into a matrix of length 3 aa (3-mers or trimers) from all three strains of mice used in these studies. We first enumerated all possible 3-mers from all CDR3s (3-mer library) and then embedded each CDR3 into a length L vector that supposed there are L possible 3-mers observed in our data, in which each entry is the number of times that 3-mer appears in a CDR3; all other 3-mers in the library but not found in the analyzed TCR were labeled as 0. To calculate the relative starting location of each 3-mer, we recorded where the 3-mer appeared in CDR3 relative to the 3-mer starting locations that are possible for each CDR3 length (e.g., a length 12 CDR3 has 10 possible 3-mer locations). To determine multiple copies of the same 3-mer in a TCR, we used a Python dictionary to keep track of the number of occurrences of a 3-mer in a particular TCR. These are then combined into an N by L matrix, where N is the number of CDR3s. Furthermore, we vectorized the V/J information and concatenated this with the N × L 3-mer matrix. Suppose there are M possible V genes and K possible J genes; each CDR3’s V gene information is embedded into a length M vector, where the V gene entry that corresponds to the V gene in the TCR is labeled as 1 and the rest as 0 to produce an N × M matrix. The same procedure was done for the length K J gene vector to produce an N × K matrix. To generate the final matrix for ML, we concatenated the three matrices to produce a matrix of size N × (L + M + K) matrix. For two-class classification, we train a random forest binary classifier, which takes in a L vector and predicts the compartment to which a CDR3 belongs. We used 70% of the distinct TCR sequences as a training set and 30% of the sequences as a testing set. Training was performed using the random forest classifier from scikit-learn using 150 trees and default settings for other parameters (33). Model performance was evaluated by calculating the area under the receiver operating characteristic (ROC) using the scikit-learn metrics package. The ROC was plotted using Python’s matplotlib library (https://ieeexplore.ieee.org/document/4160265/). Scripts used for sequence processing and analysis can be made available upon request.

To compare trimer enrichment in the non-V/J portions of the CDR3, we first removed 3 aa from either ends of CDR3 and used the central CDR3 sequences for trimer analysis. Next, to compare the relative abundance of a particular amino acid trimer in tTreg and Tconv, we created a 2 × 2 contingency table for each k-mer and then computed the p value using a χ2 test using the Python library SciPy (34). We corrected for multiple comparisons using the Benjamini-Hochberg procedure. The significantly enriched trimers in tTreg were defined as those that met the criteria tTreg/Tconv ratio ≥1.5 and false discovery rate (FDR) ≤ 0.05. The percentages of amino acid usage in the enriched trimers (Supplemental Table I) were calculated by the sum of each amino acid in the enriched trimers multiplied by their respective UMI counts divided by the total number of amino acids based on UMI counts in these enriched trimers. The percentages of amino acid usage in all trimers were calculated by sum of each amino acid in all the trimers multiplied by their respective UMI counts divided by the total number of amino acids based on UMI counts in all trimers unique to either tTreg or Tconv.

The Mann–Whitney U test was used to calculate the significant difference of UMI/TCR ratios between shared and nonshared TCRs with tTreg and Tconv. A p value <0.05 was considered significant.

The authors state that all data generated during this study are included in the article, its supplementary information file, and the Source Data file and are available from the corresponding author upon reasonable request. The raw TCR sequences data were deposited in the National Center for Biotechnology Information BioProject database with accession number 541952 (https://www.ncbi.nlm.nih.gov/bioproject/?term=541952).

To analyze αβ TCR repertoires of tTreg and Tconv, we isolated similar numbers of recently generated tTreg and Tconv from the thymus of Rag-GFP/Foxp3-RFP (25, 26) and TcrdCreERZsGreen-Foxp3-RFP mice (27) by cell sorting and applied a high-throughput sequencing method with UMI labeling of TCR mRNA. The Rag-GFP/Foxp3-RFP strain marked newly generated tTreg and Tconv in green fluorescent dye–GFP (25, 26), whereas the TcrdCreERZsGreen-Foxp3-RFP strain labeled newly produced tTreg and Tconv in fluorescent dye–ZsGreen after tamoxifen induction (27). These fluorescent markers were used in flow cytometric isolation of tTreg and Tconv to ensure the thymic origin of tTreg and Tconv by excluding the potential contamination of recirculating T cells (Supplemental Fig. 1A, 1B). We analyzed a total of 3.7 × 105 tTreg and 3.0 × 105 Tconv from three Rag-GFP/Foxp3-RFP mice and found that the estimated size of TCRα and TCRβ repertoires, identified as the number of distinct sequences, was comparably diverse between tTreg and Tconv when similar cell numbers were analyzed (Table I). We also isolated tTreg (6.0 × 104 cells) and Tconv (7.2 × 104 cells) from TcrdCreERZsGreen-Foxp3-RFP mice by cell sorting and determined their TCRα and TCRβ repertoires. Again, we found that the sizes of estimated TCRα and TCRβ repertoires were comparably diverse in similar numbers of Tconv and tTreg in TcrdCreERZsGreen-Foxp3-RFP mice (Table I). However, it should be noted that the total TCR repertoire size of Tconv in a mouse is likely larger than that of tTreg when the actual number of cells in the thymus is adjusted. Consistent with this is the observation that there were higher percentages of overlap in tTreg TCRs (3.4–5.2%) than in Tconv TCRs (1.8–2.1%) between individual mice (Rag-GFP/Foxp3-RFP) or different samples (TcrdCreERZsGreen-Foxp3-RFP) (Supplemental Fig. 2A, 2B). Because we analyzed TCRα and TCRβ repertoires separately, it could not be determined from this analysis whether the αβ combinatorial TCR repertoire was also comparable between tTreg and Tconv. V gene usage and CDR3 length distributions of TCRα and TCRβ were also not substantially different between tTreg and Tconv (Supplemental Fig. 3A–D).

Table I.
Summary of TCRα and TCRβ repertoire of tTreg and Tconv
Cell Type (Phenotype)Strain of MouseCell No.aDistinct TCRαUMI CountsEstimated TCRαbDistinct TCRβUMI CountsEstimated TCRβ
Treg (GFP+RFP+Rag-GFP-Foxp3-RFP 370,000 4,592 439,728 13,747 2,532 164,510 8,244 
Tconv (GFP+RFPRag-GFP-Foxp3-RFP 300,000 3,899 246,106 14,350 2,652 185,677 7,310 
Treg (ZsGreen+Foxp3+CD25+TcrdCreERZsGreen-Foxp3-RFP 59,990 1,716 216,004 9,919 1,837 145,782 2,817 
Tconv (ZsGreen+Foxp3CD25TcrdCreERZsGreen-Foxp3-RFP 71,760 4,335 291,632 10,219 2,693 240,464 3,547 
Treg (Foxp3+CD25+TCRα+/− TCRβ-Tg Foxp3-GFP 317,700 971 79,329 3,021    
Tconv (Foxp3CD25TCRα+/− TCRβ-Tg Foxp3-GFP 1,500,000 14,741 215,539 19,099    
Cell Type (Phenotype)Strain of MouseCell No.aDistinct TCRαUMI CountsEstimated TCRαbDistinct TCRβUMI CountsEstimated TCRβ
Treg (GFP+RFP+Rag-GFP-Foxp3-RFP 370,000 4,592 439,728 13,747 2,532 164,510 8,244 
Tconv (GFP+RFPRag-GFP-Foxp3-RFP 300,000 3,899 246,106 14,350 2,652 185,677 7,310 
Treg (ZsGreen+Foxp3+CD25+TcrdCreERZsGreen-Foxp3-RFP 59,990 1,716 216,004 9,919 1,837 145,782 2,817 
Tconv (ZsGreen+Foxp3CD25TcrdCreERZsGreen-Foxp3-RFP 71,760 4,335 291,632 10,219 2,693 240,464 3,547 
Treg (Foxp3+CD25+TCRα+/− TCRβ-Tg Foxp3-GFP 317,700 971 79,329 3,021    
Tconv (Foxp3CD25TCRα+/− TCRβ-Tg Foxp3-GFP 1,500,000 14,741 215,539 19,099    
a

Cell number, distinct TCR, UMI counts, and estimated TCR are derived from the data of the combined three samples.

b

Estimated TCR repertoires were calculated using DivE (32).

Next, we determined the degree to which αβ TCR sequences of tTreg and Tconv of normal non-TCR Tg mice are similar or distinct. To overcome the limited numbers of cells available from each individual mouse, in particular for tTreg populations, we combined TCRα and TCRβ sequences of tTreg and Tconv from three samples of each strain of mice and then compared these pooled sequence sets (Fig. 1). In Rag-GFP/Foxp3-RFP mice, we found that 12% of distinct tTreg TCRα sequences (580 out of total 4906 TCRα sequences from tTreg) were found in Tconv and accounted for 14% of distinct TCRα Tconv sequences. Those shared TCRα sequences were more abundant than nonshared sequences as they accounted for 21 and 25% of total tTreg and Tconv sequences (based on UMI counts), respectively (Fig. 1A). Compared with TCRα, the overlap in TCRβ between tTreg and Tconv was slightly lower, accounting for 11 and 10% of distinct TCRβ in Treg and Tconv, respectively (Fig. 1C). These overlapping TCRβ sequences were also more abundant, accounting for 20 and 26% of total Treg and Tconv (Fig. 1C). TCRα and TCRβ sequences found in both tTreg and Tconv were also observed in TcrdCreERZsGreen-Foxp3-RFP mice (Fig. 1B, 1D). Collectively, these findings showed that ∼9–12% TCRα and 2–11% TCRβ sequences of tTreg were found in thymic Tconv, and they accounted for 21–30% of TCRα and 5–20% of TCRβ in total tTreg. With the caveat that these data derived from individual TCRα and TCRβ sequences do not directly measure αβ pairing, these findings indicate that TCR sequence is not the only factor determining tTreg generation in thymus.

FIGURE 1.

Number and percentages of distinct and shared TCRα and TCRβ sequences between tTreg and Tconv from two lines of normal mice. (A) Shared TCRα clonotypes in tTreg and Tconv of Rag-GFP/Foxp3-RFP mice. The number and the proportion (%) of sequences found in tTreg only, in Tconv only, or in both tTreg and Tconv (overlap or shared) are presented as a function of all distinct TCRα clonotypes (headed as distinct sequences) and as a proportion of the total UMI counts corresponding to these TCRα (headed as total sequences). (B) Shared TCRα sequences in tTreg and Tconv of TcrdCreERZsGreen-Foxp3-RFP mice. (C) Shared TCRβ sequences in tTreg and Tconv of Rag-GFP/Foxp3-RFP mice. (D) Shared TCRβ sequences in tTreg and Tconv of TcrdCreERZsGreen-Foxp3-RFP mice.

FIGURE 1.

Number and percentages of distinct and shared TCRα and TCRβ sequences between tTreg and Tconv from two lines of normal mice. (A) Shared TCRα clonotypes in tTreg and Tconv of Rag-GFP/Foxp3-RFP mice. The number and the proportion (%) of sequences found in tTreg only, in Tconv only, or in both tTreg and Tconv (overlap or shared) are presented as a function of all distinct TCRα clonotypes (headed as distinct sequences) and as a proportion of the total UMI counts corresponding to these TCRα (headed as total sequences). (B) Shared TCRα sequences in tTreg and Tconv of TcrdCreERZsGreen-Foxp3-RFP mice. (C) Shared TCRβ sequences in tTreg and Tconv of Rag-GFP/Foxp3-RFP mice. (D) Shared TCRβ sequences in tTreg and Tconv of TcrdCreERZsGreen-Foxp3-RFP mice.

Close modal

To more directly analyze αβ combinatorial TCR expression in tTreg and Tconv, we analyzed TCRα sequences of tTreg and Tconv from TCRα+/− TCRβ Tg-Foxp3-GFP mice (Table I). Each T cell from these mice expresses a single TCRα (because only one TCRα allele is expressed in these TCRα+/− heterozygotes) in combination with the Tg TCRβ (the Tg TCRβ accounted for 99.99% of TCRβ sequences) so that the TCRα repertoire reflects overall αβ TCR clonotype expression. Again, we pooled TCRα sequences of three mice and compared tTreg and Tconv. Strikingly, we found that 42% (483 out of total of 1159) of distinct TCRα sequences of tTreg were shared with Tconv, accounting for 71% of total tTreg TCRα sequences (Fig. 2). Shared TCRα sequences represented only 2% of distinct TCRα of Tconv and accounted for 18% of Tconv sequences (Fig. 2). These results showed that approximately half of tTreg expressed αβTCR identical to those expressed by Tconv in this Tg mouse, indicating that the TCR sequence is not the sole determinant of Treg fate for this large proportion tTreg.

FIGURE 2.

Number and percentages of distinct and shared TCRα sequences between tTreg and Tconv of TCRα+/− TCRβ Tg-Foxp3-GFP mice. The number and the proportion (%) of sequences found in tTreg only, in Tconv only, or in both tTreg and Tconv (overlap or shared) are presented as a function of all distinct TCRα clonotypes (headed as distinct sequences) and as a proportion of the total UMI counts corresponding to these TCRα (headed as total sequences).

FIGURE 2.

Number and percentages of distinct and shared TCRα sequences between tTreg and Tconv of TCRα+/− TCRβ Tg-Foxp3-GFP mice. The number and the proportion (%) of sequences found in tTreg only, in Tconv only, or in both tTreg and Tconv (overlap or shared) are presented as a function of all distinct TCRα clonotypes (headed as distinct sequences) and as a proportion of the total UMI counts corresponding to these TCRα (headed as total sequences).

Close modal

To further characterize the αβ TCR sequences shared between tTreg and Tconv, we analyzed their relative abundance by UMI counts of each TCR in two normal mouse strains as well as in a TCRβ Tg. We grouped each of the TCRα and TCRβ sequences into four groups: 1) found only in tTreg or 2) only in Tconv, 3) shared sequences expressed in Treg, and 4) shared sequences expressed in Tconv. We found that shared TCRα sequences were significantly more abundant (two to nine times) than those of distinct TCRα sequences in both tTreg and Tconv in two normal stains of mice as well as in the TCRα+/− TCRβ Tg-Foxp3-GFP mice (Fig. 3A). This suggests that those shared TCRαβ−expressing Tconv and tTreg may either be derived from more abundant progenitor cells or might have undergone preferential expansion after differentiation. Shared TCRβ sequences were significantly more abundant (three to five times) in Tconv but not in tTreg in two normal strains of mice (Fig. 3B).

FIGURE 3.

Abundance of shared and nonshared TCRα and TCRβ sequences in tTreg and Tconv. (A) Abundance of TCRα sequences in tTreg and Tconv of three strains of mice. The number of UMI counts corresponding to those shared or nonshared TCRα sequences from all three individual samples of each strain, and the box whisker plot with mean (plus 1.5 × interquartile range) are presented for shared and nonshared TCRα within tTreg and Tconv. The data are presented as Log10 transformed values. (B) Abundance of TCRβ in tTreg and Tconv of two strains of mice. The number of UMI counts corresponding to those shared or nonshared TCRβ sequences from all three individual samples of each strain are presented as the box whisker plot. ***p < 0.001 using Mann–Whitney U test.

FIGURE 3.

Abundance of shared and nonshared TCRα and TCRβ sequences in tTreg and Tconv. (A) Abundance of TCRα sequences in tTreg and Tconv of three strains of mice. The number of UMI counts corresponding to those shared or nonshared TCRα sequences from all three individual samples of each strain, and the box whisker plot with mean (plus 1.5 × interquartile range) are presented for shared and nonshared TCRα within tTreg and Tconv. The data are presented as Log10 transformed values. (B) Abundance of TCRβ in tTreg and Tconv of two strains of mice. The number of UMI counts corresponding to those shared or nonshared TCRβ sequences from all three individual samples of each strain are presented as the box whisker plot. ***p < 0.001 using Mann–Whitney U test.

Close modal

Last, if the tTreg and Tconv TCRα sequences of TCRα+/− TCRβ Tg-Foxp3-GFP are selected based on their sequences, we would expect to see conservation of these TCRα sequences in other strains of mice. To address this, we first pooled sequences from three samples of each strains and then compared tTreg and Tconv TCRα sequences of TCRα+/− TCRβ Tg-Foxp3-GFP with those from the non-Tg mice (Rag-GFP/Foxp3-RFP and TcrdCreERZsGreen-Foxp3-EGFP). Indeed, we found a small overlap in TCRα sequences between TCRα+/− TCRβ Tg-Foxp3-GFP mice and the two non-Tg strains: 0.14 and 0.08% for distinct and total sequences (based on UMI counts) in tTreg-specific TCRα, 2.7 and 8.3% in Tconv-specific TCRα, and 1.3 and 7.6% in tTreg/Tconv–shared TCRα (Fig. 4).

FIGURE 4.

Conservation of TCRα sequences across mouse strains. Venn diagram displays the percentages of overlap of TCRα sequences (tTreg-specific, Tconv-specfic, and tTreg/Tconv–shared) found in TCRα+/− TCRβ Tg-Foxp3-GFP mice with the pooled sequences from Rag-GFP/Foxp3-RFP and TcrdCreERZsGreen-Foxp3-RFP mice. The percentages in black refer to distinct TCRα sequences, and percentages in green refer to total TCRα sequences (based on UMI counts).

FIGURE 4.

Conservation of TCRα sequences across mouse strains. Venn diagram displays the percentages of overlap of TCRα sequences (tTreg-specific, Tconv-specfic, and tTreg/Tconv–shared) found in TCRα+/− TCRβ Tg-Foxp3-GFP mice with the pooled sequences from Rag-GFP/Foxp3-RFP and TcrdCreERZsGreen-Foxp3-RFP mice. The percentages in black refer to distinct TCRα sequences, and percentages in green refer to total TCRα sequences (based on UMI counts).

Close modal

The identification of αβ TCR sequences found only in tTreg or only in Tconv could reflect true differences in the TCR repertoires of these populations or might be, at least in part, a consequence of the inability to sequence to saturation all TCR in the large Tconv population. We therefore next asked whether the αβ TCR sequences that were not identified as shared between tTreg and Tconv have distinct features that distinguish tTreg and Tconv. We applied an ML algorithm to compare the nonshared αβ TCR (V gene-CDR3 amino acids-J gene) of tTreg and Tconv. We first generated a continuous 3-aa motif (trimer) library found in TCR CDR3 (35) and incorporated V and J gene information. We then used 70% of the combined distinct TCRα and TCRβ sequences of tTreg and Tconv as a training set and 30% as a testing set using a random forest classifier from scikit-learn (33). The model performance on the testing set was calculated using the area under the ROC. We found that nonshared TCRα of tTreg were distinguishable from TCRα of Tconv with ROC = 0.82 (Fig. 5A) and that TCRβ of tTreg were distinguishable from TCRβ of Tconv with ROC = 0.72 (Fig. 5B); thus, both TCRα and TCRβ nonshared sequences of tTreg were distinguished from their counterparts in Tconv.

FIGURE 5.

Characterization of distinct nonshared TCRα and TCRβ sequences expressed by tTreg or Tconv. (A and B) ML classification of TCRα (A) and TCRβ (B) of tTreg and Tconv. The ROC and the associated areas under the curve (AUC) are presented. (C and D) Top 20 most abundant tTreg enriched trimers and their locations in CDR3α (C) and CDR3β (D). All trimers indicated were significantly enriched in tTreg (tTreg/Tconv ratio >1.5, FDR < 0.05). The location of the trimers is aligned to the N terminus of CDR3s with a length of 9 aa or greater. (E) Enriched amino acids in these tTreg trimers. Each amino acid present in the enriched trimers and in all trimers are summed and then divided by the total number of amino acids in these enriched trimers or all trimers from tTreg and Tconv. The resulting percentages and the ratios of percentage in enriched tTreg trimer/percentage in Tconv are presented.

FIGURE 5.

Characterization of distinct nonshared TCRα and TCRβ sequences expressed by tTreg or Tconv. (A and B) ML classification of TCRα (A) and TCRβ (B) of tTreg and Tconv. The ROC and the associated areas under the curve (AUC) are presented. (C and D) Top 20 most abundant tTreg enriched trimers and their locations in CDR3α (C) and CDR3β (D). All trimers indicated were significantly enriched in tTreg (tTreg/Tconv ratio >1.5, FDR < 0.05). The location of the trimers is aligned to the N terminus of CDR3s with a length of 9 aa or greater. (E) Enriched amino acids in these tTreg trimers. Each amino acid present in the enriched trimers and in all trimers are summed and then divided by the total number of amino acids in these enriched trimers or all trimers from tTreg and Tconv. The resulting percentages and the ratios of percentage in enriched tTreg trimer/percentage in Tconv are presented.

Close modal

To further determine the features of tTreg-restricted CDR3 amino acid sequences, we compared the abundance in tTreg and Tconv of specific trimers in the central region of CDR3, which mediates direct contact with Ag–MHC, excluding the N-terminal amino acid and the C-terminal 3 aa of CDR3 (24). We found that a number of trimers were significantly more abundant in tTreg than in Tconv (trimer ratio tTreg/Tconv ≥1.5, FDR < 0.05) in CDR3α (n = 49 found in 2.2% of total tTreg TCRα sequences) and CDR3β (n = 86 found in 1.2% of total tTreg TCRβ sequences) (Supplemental Table I), and the 20 most abundant trimers for CDR3α and CDR3β are presented in Fig. 5C, 5D. Strikingly, 2 aa present in these trimers were highly enriched in the abundant trimers of both CDR3α and CDR3β of tTreg: cysteine (enriched by 6.8- and 3.9-fold compared with the CDR3α and CDR3β of Tconv, respectively) and phenylalanine (enriched by 2.9- and 1.7-fold to the CDR3α and CDR3β of Tconv, respectively) (Fig. 5E), suggesting some common biophysical properties of tTreg TCRs. In addition, lysine was enriched by 2.3-fold in CDR3α, and methionine was enriched by 2.2-fold in CDR3β of tTreg. These findings indicate that TCRα and TCRβ in tTreg express a distribution of amino acids and trimer sequences that differs from those of Tconv.

To determine the degree to which TCRα and TCRβ sequences shared by Treg and Tconv resemble sequences that are found only in tTreg or only in Tconv, we applied the ML algorithm described above. Strikingly, we found that the great majority of TCRα and TCRβ sequences that are shared by Treg and Tconv were classified as Tconv in origin by this algorithm (Fig. 6A, 6B). Therefore, 82.4% of TCRα and 91.9% of TCRβ sequences were classified as Tconv/TCR–based on cutoffs selected from the receiver operator characteristic of the TCRα and TCRβ ML classifiers, respectively. Progenitors that express these shared TCR can thus differentiate to tTreg fate despite the expressions of TCR that are classified as more similar to Tconv.

FIGURE 6.

TCRα and TCRβ sequences shared by tTreg and Tconv resemble TCR found uniquely in Tconv. ML classification of TCRα (A) and TCRβ (B) sequences shared by tTreg and Tconv. A classified TCR sequence with the p ≥ 0.8 is considered as Tconv in origin, whereas ≤ 0.2 is considered as tTreg origin. By this definition, 82.4% of shared TCRα sequences were classified as Tconv and 91.9% shared TCRβ sequences were classified as Tconv origin.

FIGURE 6.

TCRα and TCRβ sequences shared by tTreg and Tconv resemble TCR found uniquely in Tconv. ML classification of TCRα (A) and TCRβ (B) sequences shared by tTreg and Tconv. A classified TCR sequence with the p ≥ 0.8 is considered as Tconv in origin, whereas ≤ 0.2 is considered as tTreg origin. By this definition, 82.4% of shared TCRα sequences were classified as Tconv and 91.9% shared TCRβ sequences were classified as Tconv origin.

Close modal

The factors that determine the selection of Tconv or tTreg fate during thymic development are not completely understood. We designed studies to assess the degree to which TCR sequence determines this lineage choice. We conducted TCRα and TCRβ sequencing of tTreg and Tconv using a UMI-based method. Comparing TCRα and TCRβ sequences between tTreg and Tconv from normal strains of mice, we found that a substantial proportion of TCRα sequences and TCRβ sequences were shared between tTreg and Tconv; in TCRα+/− TCRβ Tg-Foxp3-GFP mice, shared TCRα sequences or clonotypes were even more abundant. Notably, the TCRα and TCRβ sequences that were not shared between tTreg and Tconv were distinct in tTreg and Tconv as recognized by an ML algorithm and by identification of amino acid trimers more commonly used in CDR3α and CDR3β of tTreg than in Tconv. Finally, ML indicated that the great majority of TCRs that are shared by tTreg and Tconv have features in common with the sequences distinct to Tconv but not with sequences distinct to tTreg. Our TCR sequence analysis identified two populations of tTreg, one in which Treg fate is associated with the unique properties of the TCR and another with TCR properties that are characteristic of Tconv and for which tTreg fate may be therefore influenced by other factors than TCR. As previously described in an instructive model of intraclonal competition, the tTreg fate decision can be influenced by the abundance of tTreg precursors during thymic development (36). Further, the potential presence of precursor tTreg within the Tconv population could affect the analysis of the TCR contribution.

It has been reported that tTreg development can proceed through two progenitor cell pathways, with mature CD25+Foxp3+ tTreg being generated from either CD25Foxp3lo or CD25+ Foxp3 precursors (37). These two progenitor tTreg produce functionally distinct mature tTreg. tTreg derived from CD25+ tTreg progenitors are able to prevent experimental autoimmune encephalitis, whereas tTreg from Foxp3lo progenitor cells do not, and TCR of CD25+ tTreg progenitors have a higher affinity than those Foxp3lo progenitor cells. Sequence analysis of the Vα2 TCR family in that study revealed both distinct and overlapped sequences between these two progenitor cells. Whether these reported features of the Vα2 TCR family generalize across the entire TCR repertoire and whether the tTreg-specific TCR and the tTreg/Tconv–shared TCR that we have identified in this study have distinct progenitor origins will require further study.

Our study provides a quantitative assessment of the degree of uniqueness and similarity of αβ TCR between tTreg and Tconv. The distinct tTreg and Tconv sequences that we identified are present at a small but consistent percentage among individual mice within the same strain and between different strains, suggesting sequence conservation of tTreg- or Tconv-specific TCR characteristics. Furthermore, these conserved sequences occupy a larger fraction of total TCRα and TCRβ sequences based on UMI counts than on distinct TCR sequences. Several factors could affect the estimated proportion of these shared TCR. Because the number of Tconv that could be used for sequencing was only a fraction of total Tconv in the thymus, the percentage of shared TCR sequences in tTreg may be an underestimate. In contrast, the existence of a potential precursor tTreg in the Tconv population could increase apparent sharing, although it should be noted that the Tconv (Foxp3-RFP) analyzed for Rag-GFP/Foxp3-RFP thymocytes were only ∼1% CD25+ and that Tconv for other analyses were >99% Foxp3 CD25. The extensive sharing of αβ TCR between tTreg and Tconv is further confirmed through the analysis of TCRα sequences in a TCRβ Tg mouse, in which we found sharing of TCRα and therefore sharing of identical αβ TCR clonotypes between tTreg and Tconv in 71% of total Treg. This high degree of sharing may result from the substantially reduced size of the TCRα repertoire that can be paired with the single Tg TCRβ. Together, these findings suggest that TCR sequence alone does not determine the fate of tTreg or Tconv for a significant proportion of tTreg.

The UMI marking of TCR mRNA molecules allowed us to calculate the abundance of TCR sequences in tTreg and Tconv more accurately than prior approaches. This strategy led to the observation that those TCRα sequences in normal mice and αβ TCR clonotypes in TCRα+/− TCRβ Tg-Foxp3-GFP mice that are shared between tTreg and Tconv are significantly more abundant than the nonshared TCRα sequences and αβ TCR clonotypes. It is not clear whether the abundance of those shared TCR occurs prior to or after the fate decision of tTreg and Tconv. A study using fluorescent tracking of thymic mature CD4+ and tTreg showed only one cell division postselection (38). This suggests that a high abundance of a given preselection progenitor may simply increase the probability that some progenitors of that clonotype will differentiate to tTreg and others to Tconv, reflecting a role of additional factors that drive Treg differentiation in concert with TCR signaling. However, TCRβ sequences shared by tTreg and Tconv were significantly more abundant than nonshared sequences expressed only in Tconv but not more abundant than nonshared sequences expressed only in tTreg. It remains to be determined whether there is selective expansion occurring after the fate decision in the thymus for Tconv that have shared TCR with tTreg.

ML has become an increasingly powerful tool in biological studies, in particular for those involving large datasets (39, 40). In this study, we applied ML to analyze TCRs by first partitioning each distinct TCR into its component V-CDR3 (multiple continuous tri–amino acids)-J factors and determining whether nonshared TCRα and TCRβ sequences are distinct between tTreg and Tconv. Indeed, a random forest classifier is able to distinguish those TCRα and TCRβ sequences of tTreg from those of Tconv with high accuracy. The identification of specific trimers that occur at a higher frequency in tTreg CDR3α and CDR3β reveals that cysteine and phenylalanine are enriched in both CDR3α and CDR3β of tTreg, suggesting common structural features of at least some tTreg TCR. This ML algorithm can be further improved when more TCR sequences of tTreg and Tconv are generated and can also be modified to test αβ paired TCRs as the potential for single cell sequencing advances. In addition, we observed that lysine is enriched in CDR3α trimers, whereas methionine is enriched in CDR3β trimers of tTreg. Although the roles of these enriched trimers and specific amino acids is currently unknown, cysteine is reported to be enriched in CDR3α and CDR3β in intraepithelial lymphocytes and type A intraepithelial lymphocytes precursors (41). Interestingly, cysteine and phenylalanine are reported to be less frequent in CDR3β of MHC-restricted than in MHC-independent TCR-expressing thymocytes (24), and phenylalanine was found in CDR3β of self-reactive TCR (42). Thus, it is possible that the enriched amino acids and enriched trimers in tTreg CDR3 may be involved in the interaction with self-antigen peptide rather than MHC during tTreg differentiation. Collectively, our findings demonstrate that TCR expressed by tTreg and not by Tconv have distinct CDR3α and CDR3β sequences compared with those of Tconv, supporting the critical role of TCR in the tTreg generation in thymus. Further studies will be needed to characterize the interaction of these TCRs with potentially selecting self-peptides to provide insights into the role of TCR in tTreg generation in the thymus.

Collectively, the results reported in this study have identified features that distinguish the αβ TCR sequences expressed by a significant proportion of tTreg from those expressed by Tconv in the thymus of normal mice and which therefore appear critical to determining differentiation into these lineages. The identification in tTreg of preferentially used trimers and selected amino acids in CDR3α and CDR3β provides molecular features for further understanding of TCR and Ag interaction in tTreg generation. For TCR clonotypes that are shared between tTreg and Tconv, it remains to be elucidated what are the non-TCR factors that drive the same TCR-carrying progenitors into either Treg or Tconv lineage. Candidates for such factors include differential costimulatory signaling, cytokine requirements, and other aspects of APC and thymic environment. The ML classification of tTreg and Tconv subsets by TCR sequence described in this study provides a strategy for dissecting the molecular pathways that mark these lineages at a single cell level. Combining ML and single cell analysis of αβ TCR sequence with transcriptome and other molecular parameters will allow better definition of the selection, function, and activation state of these T cell subsets.

We thank Wanjun Chen, Ethan Shevach, Angela Thornton, and Karen Hathcock for careful reviews and comments on this manuscript, Jeffrey Cifello for independently verifying some calculations used in the results, and Rong-Fong Shen and Wells Wu for help in TCR sequencing.

This research work was supported in part by the Intramural Research Programs of National Institutes of Health, the National Cancer Institute (to M.W. and R.J.H.), the National Institute on Aging (to N.-p.W.), and National Institutes of Health Grant R01 GM059638 (to Y.Z.). This study uses the computational resources of the National Institutes of Health High Performing Computation Biowulf cluster.

The sequences presented in this article have been submitted to the National Center for Biotechnology Information BioProject database under accession number 541952.

The online version of this article contains supplemental material.

Abbreviations used in this article:

FDR

false discovery rate

ML

machine learning

ROC

receiver operating characteristic

Tconv

thymic conventional CD4+ T cell

Tg

transgenic

Treg

regulatory T cell

tTreg

thymic Treg

UMI

Unique Molecular Identifier.

1
Josefowicz
,
S. Z.
,
L. F.
Lu
,
A. Y.
Rudensky
.
2012
.
Regulatory T cells: mechanisms of differentiation and function.
Annu. Rev. Immunol.
30
:
531
564
.
2
Josefowicz
,
S. Z.
,
A.
Rudensky
.
2009
.
Control of regulatory T cell lineage commitment and maintenance.
Immunity
30
:
616
625
.
3
Klein
,
L.
,
E. A.
Robey
,
C. S.
Hsieh
.
2019
.
Central CD4+ T cell tolerance: deletion versus regulatory T cell differentiation.
Nat. Rev. Immunol.
19
:
7
18
.
4
Chen
,
W.
,
J. E.
Konkel
.
2015
.
Development of thymic Foxp3(+) regulatory T cells: TGF-β matters.
Eur. J. Immunol.
45
:
958
965
.
5
Malhotra
,
D.
,
M. K.
Jenkins
.
2016
.
Regulatory T cells: a crisis averted.
Immunity
44
:
1079
1081
.
6
Bilate
,
A. M.
,
J. J.
Lafaille
.
2012
.
Induced CD4+Foxp3+ regulatory T cells in immune tolerance.
Annu. Rev. Immunol.
30
:
733
758
.
7
Plitas
,
G.
,
A. Y.
Rudensky
.
2016
.
Regulatory T cells: differentiation and function.
Cancer Immunol. Res.
4
:
721
725
.
8
Shevach
,
E. M.
,
A. M.
Thornton
.
2014
.
tTregs, pTregs, and iTregs: similarities and differences.
Immunol. Rev.
259
:
88
102
.
9
Lee
,
H. M.
,
J. L.
Bautista
,
J.
Scott-Browne
,
J. F.
Mohan
,
C. S.
Hsieh
.
2012
.
A broad range of self-reactivity drives thymic regulatory T cell selection to limit responses to self.
Immunity
37
:
475
486
.
10
Stritesky
,
G. L.
,
S. C.
Jameson
,
K. A.
Hogquist
.
2012
.
Selection of self-reactive T cells in the thymus.
Annu. Rev. Immunol.
30
:
95
114
.
11
Malchow
,
S.
,
D. S.
Leventhal
,
V.
Lee
,
S.
Nishi
,
N. D.
Socci
,
P. A.
Savage
.
2016
.
Aire enforces immune tolerance by directing autoreactive T cells into the regulatory T cell lineage.
Immunity
44
:
1102
1113
.
12
Perry
,
J. S. A.
,
C. J.
Lio
,
A. L.
Kau
,
K.
Nutsch
,
Z.
Yang
,
J. I.
Gordon
,
K. M.
Murphy
,
C. S.
Hsieh
.
2014
.
Distinct contributions of Aire and antigen-presenting-cell subsets to the generation of self-tolerance in the thymus.
Immunity
41
:
414
426
.
13
Klein
,
L.
,
B.
Kyewski
,
P. M.
Allen
,
K. A.
Hogquist
.
2014
.
Positive and negative selection of the T cell repertoire: what thymocytes see (and don’t see).
Nat. Rev. Immunol.
14
:
377
391
.
14
Bayer
,
A. L.
,
A.
Yu
,
D.
Adeegbe
,
T. R.
Malek
.
2005
.
Essential role for interleukin-2 for CD4(+)CD25(+) T regulatory cell development during the neonatal period.
J. Exp. Med.
201
:
769
777
.
15
Ouyang
,
W.
,
O.
Beckett
,
Q.
Ma
,
M. O.
Li
.
2010
.
Transforming growth factor-beta signaling curbs thymic negative selection promoting regulatory T cell development.
Immunity
32
:
642
653
.
16
Liu
,
Y.
,
P.
Zhang
,
J.
Li
,
A. B.
Kulkarni
,
S.
Perruche
,
W.
Chen
.
2008
.
A critical function for TGF-beta signaling in the development of natural CD4+CD25+Foxp3+ regulatory T cells.
Nat. Immunol.
9
:
632
640
.
17
Salomon
,
B.
,
D. J.
Lenschow
,
L.
Rhee
,
N.
Ashourian
,
B.
Singh
,
A.
Sharpe
,
J. A.
Bluestone
.
2000
.
B7/CD28 costimulation is essential for the homeostasis of the CD4+CD25+ immunoregulatory T cells that control autoimmune diabetes.
Immunity
12
:
431
440
.
18
Hsieh
,
C. S.
,
Y.
Zheng
,
Y.
Liang
,
J. D.
Fontenot
,
A. Y.
Rudensky
.
2006
.
An intersection between the self-reactive regulatory and nonregulatory T cell receptor repertoires.
Nat. Immunol.
7
:
401
410
.
19
Pacholczyk
,
R.
,
H.
Ignatowicz
,
P.
Kraj
,
L.
Ignatowicz
.
2006
.
Origin and T cell receptor diversity of Foxp3+CD4+CD25+ T cells.
Immunity
25
:
249
259
.
20
Bergot
,
A. S.
,
W.
Chaara
,
E.
Ruggiero
,
E.
Mariotti-Ferrandiz
,
S.
Dulauroy
,
M.
Schmidt
,
C.
von Kalle
,
A.
Six
,
D.
Klatzmann
.
2015
.
TCR sequences and tissue distribution discriminate the subsets of naïve and activated/memory Treg cells in mice.
Eur. J. Immunol.
45
:
1524
1534
.
21
Relland
,
L. M.
,
J. B.
Williams
,
G. N.
Relland
,
D.
Haribhai
,
J.
Ziegelbauer
,
M.
Yassai
,
J.
Gorski
,
C. B.
Williams
.
2012
.
The TCR repertoires of regulatory and conventional T cells specific for the same foreign antigen are distinct.
J. Immunol.
189
:
3566
3574
.
22
Wolf
,
K. J.
,
R. O.
Emerson
,
J.
Pingel
,
R. M.
Buller
,
R. J.
DiPaolo
.
2016
.
Conventional and regulatory CD4+ T cells that share identical TCRs are derived from common clones.
PLoS One
11
: e0153705.
23
Shugay
,
M.
,
O. V.
Britanova
,
E. M.
Merzlyak
,
M. A.
Turchaninova
,
I. Z.
Mamedov
,
T. R.
Tuganbaev
,
D. A.
Bolotin
,
D. B.
Staroverov
,
E. V.
Putintseva
,
K.
Plevova
, et al
.
2014
.
Towards error-free profiling of immune repertoires.
Nat. Methods
11
:
653
655
.
24
Lu
,
J.
,
F.
Van Laethem
,
A.
Bhattacharya
,
M.
Craveiro
,
I.
Saba
,
J.
Chu
,
N. C.
Love
,
A.
Tikhonova
,
S.
Radaev
,
X.
Sun
, et al
.
2019
.
Molecular constraints on CDR3 for thymic selection of MHC-restricted TCRs from a random pre-selection repertoire.
Nat. Commun.
10
:
1019
.
25
Yu
,
W.
,
H.
Nagaoka
,
Z.
Misulovin
,
E.
Meffre
,
H.
Suh
,
M.
Jankovic
,
N.
Yannoutsos
,
R.
Casellas
,
E.
Besmer
,
F.
Papavasiliou
, et al
.
1999
.
RAG expression in B cells in secondary lymphoid tissues.
Cold Spring Harb. Symp. Quant. Biol.
64
:
207
210
.
26
Wan
,
Y. Y.
,
R. A.
Flavell
.
2005
.
Identifying Foxp3-expressing suppressor T cells with a bicistronic reporter.
Proc. Natl. Acad. Sci. USA
102
:
5126
5131
.
27
Zhang
,
B.
,
Q.
Jia
,
C.
Bock
,
G.
Chen
,
H.
Yu
,
Q.
Ni
,
Y.
Wan
,
Q.
Li
,
Y.
Zhuang
.
2016
.
Glimpse of natural selection of long-lived T-cell clones in healthy life.
Proc. Natl. Acad. Sci. USA
113
:
9858
9863
.
28
Shinkai
,
Y.
,
S.
Koyasu
,
K.
Nakayama
,
K. M.
Murphy
,
D. Y.
Loh
,
E. L.
Reinherz
,
F. W.
Alt
.
1993
.
Restoration of T cell development in RAG-2-deficient mice by functional TCR transgenes.
Science
259
:
822
825
.
29
Hathcock
,
K. S.
,
S.
Bowen
,
F.
Livak
,
R. J.
Hodes
.
2013
.
ATM influences the efficiency of TCRβ rearrangement, subsequent TCRβ-dependent T cell development, and generation of the pre-selection TCRβ CDR3 repertoire.
PLoS One
8
: e62188.
30
Korn
,
T.
,
J.
Reddy
,
W.
Gao
,
E.
Bettelli
,
A.
Awasthi
,
T. R.
Petersen
,
B. T.
Bäckström
,
R. A.
Sobel
,
K. W.
Wucherpfennig
,
T. B.
Strom
, et al
.
2007
.
Myelin-specific regulatory T cells accumulate in the CNS but fail to control autoimmune inflammation.
Nat. Med.
13
:
423
431
.
31
Lefranc
,
M. P.
,
C.
Pommié
,
M.
Ruiz
,
V.
Giudicelli
,
E.
Foulquier
,
L.
Truong
,
V.
Thouvenin-Contet
,
G.
Lefranc
.
2003
.
IMGT unique numbering for immunoglobulin and T cell receptor variable domains and Ig superfamily V-like domains.
Dev. Comp. Immunol.
27
:
55
77
.
32
Laydon
,
D. J.
,
C. R.
Bangham
,
B.
Asquith
.
2015
.
Estimating T-cell repertoire diversity: limitations of classical estimators and a new approach.
Philos. Trans. R. Soc. Lond. B Biol. Sci.
370
: 20140291.
33
Pedregosa
,
F.
,
G.
Varoquaux
,
A.
Gramfort
,
V.
Michel
,
B.
Thirion
,
O.
Grisel
,
M.
Blondel
,
P.
Prettenhofer
,
R.
Weiss
,
V.
Dubourg
, et al
.
2011
.
Scikit-learn: machine learning in Python.
J. Mach. Learn. Res.
12
:
2825
2830
.
34
Jones, E., E. Oliphant, P. Peterson, et al. 2001. SciPy: Open Source Scientific Tools for Python. Available at: http://www.scipy.org/. Accessed May 12, 2018.
35
Glanville
,
J.
,
H.
Huang
,
A.
Nau
,
O.
Hatton
,
L. E.
Wagar
,
F.
Rubelt
,
X.
Ji
,
A.
Han
,
S. M.
Krams
,
C.
Pettus
, et al
.
2017
.
Identifying specificity groups in the T cell receptor repertoire.
Nature
547
:
94
98
.
36
Bautista
,
J. L.
,
C. W.
Lio
,
S. K.
Lathrop
,
K.
Forbush
,
Y.
Liang
,
J.
Luo
,
A. Y.
Rudensky
,
C. S.
Hsieh
.
2009
.
Intraclonal competition limits the fate determination of regulatory T cells in the thymus.
Nat. Immunol.
10
:
610
617
.
37
Owen
,
D. L.
,
S. A.
Mahmud
,
L. E.
Sjaastad
,
J. B.
Williams
,
J. A.
Spanier
,
D. R.
Simeonov
,
R.
Ruscher
,
W.
Huang
,
I.
Proekt
,
C. N.
Miller
, et al
.
2019
.
Thymic regulatory T cells arise via two distinct developmental programs.
Nat. Immunol.
20
:
195
205
.
38
Föhse
,
L.
,
A.
Reinhardt
,
L.
Oberdörfer
,
S.
Schmitz
,
R.
Förster
,
B.
Malissen
,
I.
Prinz
.
2013
.
Differential postselection proliferation dynamics of αβ T cells, Foxp3+ regulatory T cells, and invariant NKT cells monitored by genetic pulse labeling.
J. Immunol.
191
:
2384
2392
.
39
Camacho
,
D. M.
,
K. M.
Collins
,
R. K.
Powers
,
J. C.
Costello
,
J. J.
Collins
.
2018
.
Next-generation machine learning for biological networks.
Cell
173
:
1581
1592
.
40
Zou
,
J.
,
M.
Huss
,
A.
Abid
,
P.
Mohammadi
,
A.
Torkamani
,
A.
Telenti
.
2019
.
A primer on deep learning in genomics.
Nat. Genet.
51
:
12
18
.
41
Wirasinha
,
R. C.
,
M.
Singh
,
S. K.
Archer
,
A.
Chan
,
P. F.
Harrison
,
C. C.
Goodnow
,
S. R.
Daley
.
2018
.
αβ T-cell receptors with a central CDR3 cysteine are enriched in CD8αα intraepithelial lymphocytes and their thymic precursors.
Immunol. Cell Biol.
96
:
553
561
.
42
Stadinski
,
B. D.
,
K.
Shekhar
,
I.
Gómez-Touriño
,
J.
Jung
,
K.
Sasaki
,
A. K.
Sewell
,
M.
Peakman
,
A. K.
Chakraborty
,
E. S.
Huseby
.
2016
.
Hydrophobic CDR3 residues promote the development of self-reactive T cells.
Nat. Immunol.
17
:
946
955
.

The authors have no financial conflicts of interest.

Supplementary data