During infection and autoimmune disease, activation and expansion of T cells take place. Consequently, the TCR repertoire contains information about ongoing and past diseases. Analysis and interpretation of the human TCR repertoire are hampered by its size and stochastic variation and by the diversity of Ags and Ag-presenting molecules encoded by the MHC, but are highly desirable and would greatly impact fundamental and clinical immunology. A subset of the TCR repertoire is formed by invariant T cells. Invariant T cells express interdonor-conserved TCRs and recognize a limited set of Ags, presented by nonpolymorphic Ag-presenting molecules. Discovery of the three known invariant T cell populations has been a tedious and slow process, identifying them one by one. Because conservation of the TCR α-chain of invariant T cells is much higher than the β-chain, and because the TCR α-chain V gene segment TRAV1-2 is used by two of the three known invariant TCRs, we employed next-generation sequencing of TCR α-chains that contain the TRAV1-2 gene segment to identify 16 invariant TCRs shared among many blood donors. Frequency analysis of individual clones indicates these T cells are expanded in many donors, implying an important role in human immunity. This approach extends the number of known interdonor-conserved TCRs and suggests that many more exist and that these TCR patterns can be used to systematically evaluate human Ag exposure.

The most remarkable feature of TCRs is their diversity and the mechanisms that generate it. Surprisingly, T cells exist in which these diversity-generating mechanisms gave rise to receptors that are simple and highly conserved among individuals: NKT cells, mucosal-associated invariant T (MAIT) cells, and germline-encoded mycolyl lipid–reactive (GEM) T cells. These cells use a TCR consisting of an invariant TCR α-chain with very few nontemplated (N) nucleotides and a more diverse, but biased TCR β-chain repertoire. All known invariant T cells recognize nonpolymorphic MHC class I–like molecules liganded with nonpeptidic Ags. Although all MAIT cells use the TCR α-chain J segment TRAJ33 and all GEM T cells use TRAJ9, both use the V segment TRAV1-2, formerly called Vα7.2. TRAV1-2 is an evolutionary conserved gene segment located at the most distal location of the TRAV/DV locus, embedded within the olfactory receptors (1, 2).

NKT cells, MAIT cells, and GEM T cells recognize nonpeptidic Ags bound to the nonpolymorphic Ag-presenting molecules CD1d, MR1, and CD1b, respectively. A diverse T cell repertoire is necessary for the recognition of the vast array of peptide Ags presented by classical MHC molecules with many allelic variants among the human population. In theory, much less TCR diversity is necessary for the recognition of nonpolymorphic MHC-like molecules, given the smaller number of Ags and lack of variations in Ag-presenting molecules from person to person. Recent discoveries continue to expand the spectrum of complexes of nonpolymorphic Ag-presenting molecules and nonpeptidic Ags. Each complex is a potential target of one or possibly several invariant T cell populations, which opens the possibility that many more undiscovered invariant T cells exist in the human T cell repertoire. Yet, most nonpolymorphic antigenic complexes have not yet been studied systematically, and the potential for many types of invariant T cell populations in the human TCR repertoire has not been evaluated with next-generation sequencing methods. Because the TCR α-chain conservation within an invariant T cell population is almost absolute, TCR α-chain datasets can be used for the discovery of new invariant T cells. Despite the many available TCR β-chain datasets, there is only one publicly available dataset, derived from one blood donor, that also includes TCR α-chains (3). The reason for this imbalance is partly historical and party technical because the TCR-α locus contains many more gene segments than the TCR-β locus. Thus, for the identification of TCR α-chains that are conserved among the human population, we generated TCR α-chain datasets derived from multiple donors.

Even though TRAV1-2 is also used by conventional, diverse T cells that are restricted by classical MHC, it may be the gene segment preferentially used in the generation of invariant T cells, in addition to GEM T cells and MAIT cells that use it. Using data-filtering methods to identify TCR α-chains that are conserved among individuals and that use few N nucleotides, we found 16 new invariant TCR α-chains that use TRAV1-2. Considerable expansion of these new invariant T cells was detected in some donors. The identification of invariant T cells based on TCR sequence, without prior knowledge of their specificity and function, enables a highly targeted subsequent functional characterization of these cells. Because these invariant TCRs are conserved among unrelated human donors, these data strongly support the feasibility of the long-sought goal of TCR-based evaluation of infectious disease status and other disease processes with T cell involvement such as cancer and autoimmunity.

For the TRAV1-2 dataset, blood was obtained from asymptomatic tuberculin-positive donors clinically assessed to have latent tuberculosis but with no clinical or radiographic evidence of active tuberculosis (samples starting with “C”), and from blood-bank donors (samples starting with “B”), after informed consent was obtained, as approved by the institutional review boards of Lemuel Shattuck Hospital and Partners Healthcare (Boston, MA). For the Academic Medical Center dataset, blood was obtained from five different random blood bank donors at Sanquin (Amsterdam, the Netherlands). High-resolution HLA genotyping was performed by next-generation sequencing using the 454 Life Sciences GS FLX system and Titanium chemistry, as previously described (4).

mAbs against TRAV1-2 and CD4 (3C10 from BioLegend, and RPA-T4 allophycocyanin from BD Biosciences) were incubated for 30 min on ice. Cells were pregated for lymphocytes based on forward and side scatter and sorted on an 11-color FACSAria (BD Biosciences). RNA was isolated with a RNeasy kit (Qiagen), and first-strand cDNA was synthesized with a Quantitect reverse-transcription kit (Qiagen), including a genomic DNA-removal step. The generation of amplicons for next generation sequencing was previously described (5, 6). The cDNA was amplified using a full repertoire approach, using multiple primers to cover all known V genes in a linear amplification protocol to prevent bias between primers. For this work, only the TRAV1-2 primer is relevant, as either the cells used were FACS sorted using a TRAV1-2–specific Ab (TRAV1-2 dataset) or only the TRAV1-2 sequences were used (Academic Medical Center dataset). The TRAV1-2–specific primer is 5′-GGACAAARCMTTGASCAGCC-3′ (7). The Vα primers are tailed with the primerB sequence of the LibA system (Titanium protocol) (5′-CTATGCGCCTTGCCAGCCCGCTCAG-3′) (Roche/454). In the first step of the linear amplification, the cDNA samples are amplified in a volume of 20 μl in the presence of 1× Buffer B (Solis BioDyne), 0.1 mM dNTPs (each), 1 mM MgCl2, 0.25 U/μl HotFire polymerase (Solis BioDyne), and 0.2 pmol/μl each Vα primer. The amplification was run on a T1 thermocycler (Biometra) using the following cycling conditions: 1 × 96°C for 15 min, 40 × (96°C for 30 s, 60°C for 1 min, and 72°C for 30 s), and 1 × 72°C for 10 min. After amplification, the amplicons were purified using AMPURE SPRI beads (Agencourt), according to the manufacturer’s instructions, using equal amounts of amplicon volume and bead volume. In the second step, a PCR is performed using primerB sequence of the Roche 454 LibA system as forward primer and a Cα-specific primer 5′-TCTCAGCTGGTACACGGCAG-3′ (7) tailed with both a genetic barcode and the primerA sequence of the LibA system (Titanium protocol) (5′-CGTATCGCCTCCCTCGCGCCATCAG-3′) (Roche/454). Amplification conditions are equal to the linear amplification step. The cycling conditions were as follows: 1 × 96°C for 15 min, 35 × (96°C for 30 s, 60°C for 1 min, and 72°C for 30 s), and 1 × 72°C for 10 min. After amplification, the amplicons purified again using AMPURE SPRI beads (Agencourt).

Sequencing was performed on the Roche/454 Genome sequencer using the Titanium platform. Preparations and quality controls were performed according to the manufacturer’s instructions.

TCR α-chain sequences were analyzed using a pipeline described previously (5). Briefly, sequences were sorted per sample using genetic barcodes. Subsequently, V and J segments were identified by comparison against the IMGT database (8) using the BLAST-like alignment tool (9) with default settings. V and J segments with the highest score and percent identity were assigned to the sequences. Sequences that were out of frame or too short to unequivocally assign V and J segments were discarded. The CDR3 was defined as the region from the cysteine at aa position 104 to the phenylalanine-glycine motif at aa position 118–119 (10). N and P nucleotides were reported as the region between V and J alignments, and are together treated as N nucleotides in the filtering strategy. Sequences were denoted as duplicates if they had an identical Vα, Jα, and CDR3α at nucleotide level. Data were analyzed and filtered, as described in 6Results, using in-house–developed R scripts. The amount of expansion was calculated as the percentage of sequences per clone compared with the total amount of valid sequences per sample. Clones with a percentage >1% are considered highly expanded, between 0.1 and 1% medium expanded, and <0.1% low or not expanded.

To generate sequences of TCR α-chains that use TRAV1-2, we sorted T cells from six human subjects based on binding to Abs against TRAV1-2 and CD4 (Table I, Fig. 1A). We chose to split the TRAV1-2+ population in TRAV1-2+CD4+ and TRAV1-2+CD4 and treat them as separate samples because MAIT and GEM cells both express TRAV1-2 but have been described to exist mostly as CD4 and CD4+, respectively (11, 12). Unlike classical MHC-restricted T cells, the expression of CD4 or CD8 or the lack thereof has no known effects on the functionality of T cells that recognize nonpolymorphic Ag-presenting molecules (13, 14). We did not separate based on CD8 expression to avoid effects of cell activation, as most anti-CD8 Abs detect CD8αα, which is regulated according to the activation status of the cell. The separation based on CD4 appears more robust, because the presence or absence of CD4 is a stable feature of a T cell. Thus, the TRAV1-2+CD4 population contains CD8αβ+ T cells and double-negative T cells, regardless of their activation status, and the CD4+ population contains CD4+ CD8αβ cells, regardless of their activation status. Using 454 sequencing technology, ∼10,000 TCR α-chain sequences were obtained from each of the 12 samples (TRAV1-2+CD4+ and TRAV1-2+CD4 populations from six subjects). Sequence read length was between 379 and 403 nt and averaged 385 nt. Nonproductive α-chains, which consisted of out-of-frame sequences and sequences with internal stop codons, were removed, as well as sequences that were too short to assign V and J segment usage (Fig. 1B). Subsequently, we removed all duplicates. Duplicates were defined as sequence reads with identical V and J segment usage and identical joints, and thus, identical CDR3s are most likely derived from one T cell clone. The resulting 10,823 sequences (called clones) are thought to represent different, unique T cell clones.

Table I.
Patients
SubjectInfectionGenderAgeHLA-AHLA-BHLA-CHLA-DPDQADQBDRB1DRB3DRB4
C28 Latent TB Male 44 
C34 Latent TB Female 33 02:01 13:02 06:02 04:01 05 03:01 11:04 02:02  
30:01 51:01 15:09 10:01 05 03:01 14:06 01:01 
C52 Latent TB Female 35 23:01 35:01 04:01 01:01 04 03:19 18:04 01:01  
68:02 53:01 04:01 01:01 05 03:01 13:03  
C58 Latent TB Male 56 02:01 35:03 07:01 04:01 03 03:02 04:04  01:01 
03:01 58:01 12:03 04:02 01:02 06:09 13:02 03:01 
B36 01:01 08:02 06:02 04:01 05 02:01 03:01 01:01  
01:01 13:02 07:01 04:01 02:01 02:02 07:01  01:01 
B38 01:01 18:01 02:02 04:02 02:01 02:02 07:01  01:01 
32:01 40:02 07:01 04:02 05 03:01 11:04 02:02  
SubjectInfectionGenderAgeHLA-AHLA-BHLA-CHLA-DPDQADQBDRB1DRB3DRB4
C28 Latent TB Male 44 
C34 Latent TB Female 33 02:01 13:02 06:02 04:01 05 03:01 11:04 02:02  
30:01 51:01 15:09 10:01 05 03:01 14:06 01:01 
C52 Latent TB Female 35 23:01 35:01 04:01 01:01 04 03:19 18:04 01:01  
68:02 53:01 04:01 01:01 05 03:01 13:03  
C58 Latent TB Male 56 02:01 35:03 07:01 04:01 03 03:02 04:04  01:01 
03:01 58:01 12:03 04:02 01:02 06:09 13:02 03:01 
B36 01:01 08:02 06:02 04:01 05 02:01 03:01 01:01  
01:01 13:02 07:01 04:01 02:01 02:02 07:01  01:01 
B38 01:01 18:01 02:02 04:02 02:01 02:02 07:01  01:01 
32:01 40:02 07:01 04:02 05 03:01 11:04 02:02  

TB, tuberculosis; ?, data not available.

FIGURE 1.

Identification of invariant TRAV1-2+ α-chains. (A) TRAV1-2+CD4+ and TRAV1-2+CD4 cells were sorted from asymptomatic tuberculin-positive patients (designated with C), or random blood bank donors (designated with B). (B) TCR α-chain sequences were derived from the sorted populations, as well as from external sources of nonsorted T cells. (C) A method of filtering for characteristics that define invariant TCR α-chains was applied. Confirmation of the obtained candidates was sought in external datasets. (D) For each of the resulting 16 candidates, motifs were determined. aClones are defined as a group of nucleotide sequences using the same V and J segment and the same CDR3 sequence. To obtain the number of clones in a dataset, duplicates of the same rearrangement are removed, and the result is a number of unique rearrangements. bThe publicly available dataset described by Wang et al. (3) was analyzed using our pipeline. cThe most abundant variant that passed the filters was selected as the consensus invariant α-chain, and the less abundant variant was considered to be a variant covered by the motif. dX denotes the location of a variable amino acid position in a motif. AMC, dataset from an unrelated project at the Academic Medical Center, Amsterdam.

FIGURE 1.

Identification of invariant TRAV1-2+ α-chains. (A) TRAV1-2+CD4+ and TRAV1-2+CD4 cells were sorted from asymptomatic tuberculin-positive patients (designated with C), or random blood bank donors (designated with B). (B) TCR α-chain sequences were derived from the sorted populations, as well as from external sources of nonsorted T cells. (C) A method of filtering for characteristics that define invariant TCR α-chains was applied. Confirmation of the obtained candidates was sought in external datasets. (D) For each of the resulting 16 candidates, motifs were determined. aClones are defined as a group of nucleotide sequences using the same V and J segment and the same CDR3 sequence. To obtain the number of clones in a dataset, duplicates of the same rearrangement are removed, and the result is a number of unique rearrangements. bThe publicly available dataset described by Wang et al. (3) was analyzed using our pipeline. cThe most abundant variant that passed the filters was selected as the consensus invariant α-chain, and the less abundant variant was considered to be a variant covered by the motif. dX denotes the location of a variable amino acid position in a motif. AMC, dataset from an unrelated project at the Academic Medical Center, Amsterdam.

Close modal

The known invariant α-chains expressed by NKT cells, MAIT cells, and GEM T cells fulfill two criteria: they are present in most, if not all, individuals, and they contain very few N nucleotides (12, 15, 16). These invariant α-chains consist of one predominant consensus amino acid sequence and occasionally diverge from the consensus sequence by up to 2 aa, while keeping chain length identical. Also, invariant consensus sequences can be encoded by slightly different nucleotide sequences. Therefore, to identify invariant α-chain candidates, we only considered α-chains with 0, 1, 2, or 3 N nucleotides (Fig. 1C) (15). We then translated these α-chains into protein sequences and generated a list of the unique protein sequences. Of these protein sequences, we only considered sequences that were present in at least half of the samples, as one feature of invariant T cells is their presence in multiple subjects. Among the 33 α-chains that passed that filter (Supplemental Fig. 1A), 14 were removed because they were identified as a known type of invariant T cell, as they expressed MAIT α-chains that use TRAJ33 to form a CDR3 sequence CAXXDSNYQLIWGAG, or TRAJ12 to form a CDR3 sequence CAVXDSSYKLIFG, where X can be any amino acid. The 19 consensus sequences that were left were grouped into 16 invariant α-chain motifs with identical TRAJ segment and CDR3 length (Fig. 1D). Of the 16 invariant α-chain motifs, 1 is found in four of six subjects, 7 are found in five of six subjects, and 8 are found in all six subjects. Subsequently, we indicated the presence or absence of the new invariant α-chain consensus sequences in independent external samples that were not sorted for TRAV1-2 (3). Together, these external datasets contained ∼1600 TRAV1-2 α-chains. Finally, because variants of MAIT, NKT, and GEM T cells exist that differ at one or sometimes two positions of the amino acid consensus sequence, we looked for variants of our 19 new consensus CDR3α sequences that differ at one amino acid position from the consensus sequences and that were present in at least 3 of 12 samples. The location of the variable amino acids is indicated with an X in the CDR3 sequence (Fig. 1D). Thus, we were able to assign 16 different new invariant α-chain motifs within the human TRAV1-2 repertoire.

Of note, T cells have been described that recognize MR1 and were classified as MAIT cells based on their specificity, but that do not express the typical MAIT TCR α-chain sequence that uses TRAJ33 (11, 1719). Instead, these cells express a particular TRAV1-2 and TRAJ20-utilizing α-chain with a CDR3 length of 13 aa, or a particular TRAV1-2 and TRAJ12-utilizing α-chain with a CDR3 length of 13 aa. The latter type fits a motif that we identified independently in this work. Thus, using a sequence of heuristic filters, we reidentified canonical and atypical MAIT TCR α-chains and found 16 new invariant α-chains.

Alternative explanations for interdonor-conserved sequences are cross-contamination of samples and public TCRs. The simple V-J joint of invariant TCR α-chains is a feature that enables their interdonor-conserved nature (20). Sharing among donors is not predicted to occur among TCRs with complex joints that incorporate multiple N nucleotides. As a control for cross-contamination of samples, we generated a list of interdonor-conserved TCR α-chains that were filtered for >3 N nucleotides (Supplemental Fig. 2). With one exception, all sequences on that list are MAIT sequences. MAIT cells, regardless of the number of N nucleotides they use, are expanded in vivo due to Ag exposure. Because we found only one non-MAIT sequence among the sequences with n > 3 that was shared among donors (as opposed to 16 non-MAIT with n < 4), we conclude that cross-contamination is not the driving force behind the identification of our 16 new invariant TCR α-chains.

Public TCRs are defined as MHC-restricted TCRs that are shared among donors that share at least one MHC allele and relevant Ag exposure. To get an indication whether there are public TCRs among the 16 new invariant α-chains, we HLA typed five of the six human subjects that were used for the generation of the TRAV1-2 dataset and asked whether the occurrence of any of the new invariant α-chains was limited to subjects that express the same MHC proteins. For HLA-A, HLA-B, and HLA-DRB1, the maximum number of donors that shared a protein was two. For HLA-C, HLA-DQ, HLA-DRB3, and HLA-DRB4, the maximum number of donors that shared a protein was three (Table I, Supplemental Fig. 3). Because 4 of the 16 α-chains were present in four of five subjects with available HLA-typing results, and the other 12 α-chains were present in all five subjects (Supplemental Fig. 3), we conclude that none of the 16 new invariant α-chains is limited to subjects that share MHC alleles, and thus, none fulfills the criteria for public TCRs.

The two main factors that contribute to MAIT and NKT α-chain prevalence are convergent recombination and in vivo T cell expansion. Convergent recombination causes high precursor frequencies of certain TCR α-chains because many independent ways of joining a V and J segment lead to the same amino acid sequence (2022), whereas T cell expansion increases the number of daughter cells of individual T cell clones (23). Illustrating known mechanisms of convergent recombination in our dataset, we identified examples of identical nucleotide sequences in which nucleotides could have been contributed by either the V or J gene without addition of N nucleotides, allowing the formation of identical nucleotide sequences from germline-encoded gene segments (Fig. 2A). In addition, different nucleotide sequences that encode the same TCR α-chain amino acid sequence were identified, as well as amino acid variants of the consensus sequence. The analysis of two examples of 16 CDR3 motifs of newly identified invariant α-chains is shown in Fig. 2A. To get an overall indication of the extent of convergent recombination in the newly identified invariant α-chains, we assessed how many different nucleotide sequences (clones) encode each MAIT, GEM, and new invariant TCR α-chain, and compared this with other TRAV1-2 α-chains in our dataset (Supplemental Fig. 1, Fig. 2B). The data show that MAIT and GEM cells are more often subject to convergent recombination than the newly identified invariant α-chains and the TRAV1-2 α-chains in our dataset.

FIGURE 2.

Convergent recombination in invariant α-chains. (A) Proposed mechanisms of convergent recombination for two representative new invariant α-chains from Fig. 1D. All nucleotide sequences that are shown are present in the TRAV1-2 dataset. Nucleotides that can be attributed to the germline TRAV1-2 segment are shown in cyan, to TRAJ30 in pink, and to TRAJ4 in yellow. Amino acids that occupy the variable position in the motif are shown in red. (B) The number of unique clones that encode each peptide in the TRAV1-2 dataset was determined. The black line represents the mean of the number of clones per α-chain for GEM, MAIT, new invariant, and all other α-chains. The boxplot shows the upper and lower quartiles, the whiskers indicate the second and the 98th percentile, and the blue dots represent the outliers.

FIGURE 2.

Convergent recombination in invariant α-chains. (A) Proposed mechanisms of convergent recombination for two representative new invariant α-chains from Fig. 1D. All nucleotide sequences that are shown are present in the TRAV1-2 dataset. Nucleotides that can be attributed to the germline TRAV1-2 segment are shown in cyan, to TRAJ30 in pink, and to TRAJ4 in yellow. Amino acids that occupy the variable position in the motif are shown in red. (B) The number of unique clones that encode each peptide in the TRAV1-2 dataset was determined. The black line represents the mean of the number of clones per α-chain for GEM, MAIT, new invariant, and all other α-chains. The boxplot shows the upper and lower quartiles, the whiskers indicate the second and the 98th percentile, and the blue dots represent the outliers.

Close modal

Homeostatic or cytokine-driven T cell expansion causes all the T cells that express certain cytokine receptors or other stimulatory receptors other than the TCR to expand, whereas Ag-driven T cell expansion causes T cell clones with a certain Ag-specific TCR to expand upon specific stimulation of that TCR. NKT and MAIT cells are subject to both mechanisms (2325), whereas GEM T cells seem to be at least subject to Ag-driven expansion (12). Insight in these mechanisms is relevant for the question as to whether a T cell recognizes a specific exogenous Ag and expands upon encounter of that Ag in vivo, or whether the T cell has a function in homeostasis or amplification of other immune cell signals. To assess expansion in our dataset, we plotted the frequency of reads that represent clones encoding GEM, MAIT, or other α-chains (Fig. 3A), or new invariant α-chains (Fig. 3B) to see whether they can be classified as low, medium, or highly expanded clones (Fig. 3A). As expected, clones with MAIT α-chains are among the highly expanded clones (clones that form >1% of the TRAV1-2 repertoire). Clones with GEM α-chains include one highly expanded clone, but mostly represent medium expanded clones (clones that form between 0.1 and 1% of the TRAV1-2 repertoire).

FIGURE 3.

Clonal size of invariant α-chains as a measure of T cell expansion. (A and B) The clonal size of each clone that is present in each sample of the TRAV1-2 dataset is expressed as the percentage of the total number of reads. (A) MAIT clones are shown in green, the two types of GEM clones in red and blue, and other clones in light gray. (B) Only the clones that represent a new invariant α-chain are shown. Each color represents 1 of the 16 new invariant α-chains. All dots with the same color represent nucleotide sequences that encode the same amino acid sequence. (C) The quantitative contribution by the 16 newly identified invariant α-chains to the total TRAV1-2+ population is shown, as well as the contribution by MAIT cells, GEM cells, and other TCR α-chains.

FIGURE 3.

Clonal size of invariant α-chains as a measure of T cell expansion. (A and B) The clonal size of each clone that is present in each sample of the TRAV1-2 dataset is expressed as the percentage of the total number of reads. (A) MAIT clones are shown in green, the two types of GEM clones in red and blue, and other clones in light gray. (B) Only the clones that represent a new invariant α-chain are shown. Each color represents 1 of the 16 new invariant α-chains. All dots with the same color represent nucleotide sequences that encode the same amino acid sequence. (C) The quantitative contribution by the 16 newly identified invariant α-chains to the total TRAV1-2+ population is shown, as well as the contribution by MAIT cells, GEM cells, and other TCR α-chains.

Close modal

GEM TCR α-chains are defined as using TRAV1-2 and TRAJ9 to form the CDR3 sequence CAVRXTGGFKTIF or CAVLXTGGFKTIF (12). These individual variants do not make it through the filter of being detectable in >6 of 12 samples, but together, they exist in 8 samples (Supplemental Fig. 1B).

Of note, subjects C28, C34, C52, and C58 are latent tuberculosis patients, whereas subjects B36 and B38 are random blood bank donors. Because Mycobacterium tuberculosis produces glucose monomycolate and mycolic acid, the Ags for GEM T cells, we hypothesized that GEM T cell clones in tuberculosis patients have undergone Ag-driven expansion, whereas this is not the case in random blood donors, which are less likely to be exposed to these Ags. However, we observed expanded clones with GEM TCR α-chains in all subjects (Fig. 3A). Nevertheless, the reads of all GEM TCR α clones together form a bigger fraction of the total TRAV1-2 repertoire in the tuberculosis patients (0.22, 13.6, 0.49, and 0.33%) than in the random blood bank donors (0.09 and 0.47%). A carefully designed study with more subjects and confirmed tuberculosis-unexposed control subjects is needed to confirm this preliminary observation.

Among the clones that represent the 16 new invariant α-chain motifs, there are many that fall in the category of medium and highly expanded clones (Fig. 3B). In addition to the frequency of individual T cell clones with invariant α-chains (Fig. 3B), we determined the fraction of the TRAV1-2 population formed by all clones combined that fit one motif (Fig. 3C). Because TRAV1-2+ T cells typically form 1–2% of the total T cell population, these data give insight in the prevalence of these T cells in blood. As expected, the frequency of MAIT α-chains among CD4+ cells is lower than among CD4 cells in all donors (11, 24). Consistent with findings by Gold et al. (17), MAIT populations in our dataset are smaller in tuberculosis patients than in healthy blood bank donors (Fig. 3C). Migration into infected tissues has been suggested to account for this effect. Together, our quantitative data on MAIT and GEM cells are consistent with published findings, and the data on the newly described invariant α-chains show a big clonal size of some of these clones, indicating Ag-driven expansion in vivo.

The discovery of NKT cells, MAIT cells, and GEM T cells was a stepwise process, performed by many different research groups, using different techniques (12, 16, 26, 27). The fact that these cells express invariant TCR α-chains was discovered after their functional characterization. In this study, we show that an approach based on a search for invariant α-chains is quick and valid because the α-chains of MAIT cells and a recently described alternative form of MAIT cells passed our set of filters that was aimed to identify invariant TCR α-chains. NKT cells do not express the TRAV1-2 V segment that our search focused on, so these were not identified. Our approach led to the identification of 16 new invariant TCR α-chain motifs.

Our dataset is not particularly suitable for identification of invariant α-chains that are relatively rare in blood or exist as a group of variants. For example, the two known GEM variants (CDR3 sequence CAVRXTGGFKTIF and CAVLXTGGFKTIF) (12) did not make it through the filters independently. TRAJ33-expressing MAIT cells also exist as variants that diverge from the MAIT consensus sequence at two positions, but MAIT cells are highly abundant in the human T cell repertoire, so many different variants made it through the filters independently. Of the recently described atypical, TRAJ12- or TRAJ20-expressing MAIT TCR α-chains (11, 17, 19), the former passed our filters, but the latter did not. This suggests that our method was based on relatively stringent criteria. Therefore, our list of 16 new invariant α-chain motifs may include T cell populations that recognize nonpolymorphic Ag-presenting molecules. The isolation and functional characterization of T cells with these invariant α-chains are necessary to determine whether they recognize nonpolymorphic Ag-presenting molecules, or whether there is an alternative explanation for their interdonor conservation and expansion in vivo.

NKT cells and MAIT cells are often called innate-like T cells to distinguish them from classical adaptive T cells that are considered to form the core of the adaptive immune system. Upon primary exposure to a pathogen-derived Ag in vivo, naive classical T cells expand and become an effector and subsequently memory T cell. Even after many years, a subsequent encounter with the Ag is much quicker and more effective than the first encounter. In contrast, NKT cells and MAIT cells do not exist as naive, unexpanded populations and are not known to exhibit distinguishable primary and recall responses. For GEM T cells and the T cells that use the newly described invariant α-chains, it is unknown whether they are closer to NKT and MAIT cells, or whether they behave like classical memory-forming T cells. However, our data are consistent with a classical memory-forming behavior because we found that GEM T cells and the T cells that use the newly described invariant α-chains were not very much expanded in general, with notable exceptions of single clones that were clearly expanded.

Distal V genes (with low numbers in their gene name) are more likely than proximal V genes (with high numbers in their gene name) to recombine to distal J genes (with low numbers in their gene name) (28, 29). Recombination between distal V and J segments tends to occur later than recombination between proximal V and J segments, and is therefore taking place under conditions with less available TdT to include N nucleotides (2830). Of note, NKT, MAIT, and GEM T cells all use distal V and J segments in their TCR α-chains, as follows: TRAV10/TRAJ18, TRAV1-2/TRAJ33, and TRAV1-2/TRAJ9, respectively. The new invariant α-chains described in this work use TRAV1-2, which occupies the most distal location of the TRAV/DV locus, and TRAJ4, 9, 15, 16, 20, 26, 27, 28, 30, 31, 34, and 39, which, among the 61 J gene segments in the human genome, are considered distal. Thus, recombination of the gene segments used by new invariant α-chains is predicted to take place late, and under circumstances that bias against the insertion of N nucleotides. This seems an efficient way to generate functional TCRs in T cells that have previously undergone nonproductive recombination events, and might relate to recognition of nonpolymorphic Ag-presenting molecules, as is the case with the known invariant TCRs that consist of distal gene segments, as follows: NKT, MAIT, and GEM TCRs. Therefore, it is possible that future functional characterization will reveal recognition of nonpolymorphic Ag-presenting elements and nonpeptidic Ags among the 16 invariant T cells we have identified in this study.

Currently, the human TCR repertoire is viewed as a sea of highly diverse TCRs that recognize peptide-MHC, from which the NKT, MAIT, and GEM TCRs stand out as rare islands of conservation and unconventional specificity. Our method of next-generation TCR α-chain sequencing and analysis, conducted using only 1 among the 45 functional human Vα genes, expands the number of known invariant TCRs to 19. These data point to a new view of the repertoire in which many types of invariant T cell types exist.

We thank Dr. M. Carrington, Dr. X.J. Gao, and Dr. M.P. Martin at the Frederick National Laboratory for Cancer Research (Frederick, MD) for HLA typing.

This work was supported by National Institute of Allergy and Infectious Diseases Grants AI049313 and AR048632 (to D.B.M.), Burroughs Wellcome Fund for Translational Research, and Nederlands Wetenschappelijk Onderzoek (Meervoud 836.08.001 to I.V.R.).

The TRAV1-2 dataset presented in this article has been submitted to the Sequence Read Archive (http://www.ncbi.nlm.nih.gov/sra/) under accession number SRP044904.

The online version of this article contains supplemental material.

Abbreviations used in this article:

GEM

germline-encoded mycolyl lipid–reactive

MAIT

mucosal-associated invariant T

N

nontemplated.

1
Su
C.
,
Jakobsen
I.
,
Gu
X.
,
Nei
M.
.
1999
.
Diversity and evolution of T-cell receptor variable region genes in mammals and birds.
Immunogenetics
50
:
301
308
.
2
Haynes
M. R.
,
Wu
G. E.
.
2004
.
Evolution of the variable gene segments and recombination signal sequences of the human T-cell receptor alpha/delta locus.
Immunogenetics
56
:
470
479
.
3
Wang
C.
,
Sanders
C. M.
,
Yang
Q.
,
Schroeder
H. W.
 Jr.
,
Wang
E.
,
Babrzadeh
F.
,
Gharizadeh
B.
,
Myers
R. M.
,
Hudson
J. R.
 Jr.
,
Davis
R. W.
,
Han
J.
.
2010
.
High throughput sequencing reveals a complex pattern of dynamic interrelationships among human T cell subsets.
Proc. Natl. Acad. Sci. USA
107
:
1518
1523
.
4
Moonsamy
P. V.
,
Williams
T.
,
Bonella
P.
,
Holcomb
C. L.
,
Höglund
B. N.
,
Hillman
G.
,
Goodridge
D.
,
Turenchalk
G. S.
,
Blake
L. A.
,
Daigle
D. A.
, et al
.
2013
.
High throughput HLA genotyping using 454 sequencing and the Fluidigm Access Array™ System for simplified amplicon library preparation.
Tissue Antigens
81
:
141
149
.
5
Klarenbeek
P. L.
,
Tak
P. P.
,
van Schaik
B. D.
,
Zwinderman
A. H.
,
Jakobs
M. E.
,
Zhang
Z.
,
van Kampen
A. H.
,
van Lier
R. A.
,
Baas
F.
,
de Vries
N.
.
2010
.
Human T-cell memory consists mainly of unexpanded clones.
Immunol. Lett.
133
:
42
48
.
6
Klarenbeek
P. L.
,
de Hair
M. J.
,
Doorenspleet
M. E.
,
van Schaik
B. D.
,
Esveldt
R. E.
,
van de Sande
M. G.
,
Cantaert
T.
,
Gerlag
D. M.
,
Baeten
D.
,
van Kampen
A. H.
, et al
.
2012
.
Inflamed target tissue provides a specific niche for highly expanded T-cell clones in early human autoimmune disease.
Ann. Rheum. Dis.
71
:
1088
1093
.
7
Boria
I.
,
Cotella
D.
,
Dianzani
I.
,
Santoro
C.
,
Sblattero
D.
.
2008
.
Primer sets for cloning the human repertoire of T cell receptor variable regions.
BMC Immunol.
9
:
50
.
8
Lefranc
M. P.
,
Giudicelli
V.
,
Ginestoux
C.
,
Jabado-Michaloud
J.
,
Folch
G.
,
Bellahcene
F.
,
Wu
Y.
,
Gemrot
E.
,
Brochet
X.
,
Lane
J.
, et al
.
2009
.
IMGT, the international ImMunoGeneTics information system.
Nucleic Acids Res.
37
:
D1006
D1012
.
9
Kent
W. J.
2002
.
BLAT—the BLAST-like alignment tool.
Genome Res.
12
:
656
664
.
10
Lefranc
M. P.
,
Pommié
C.
,
Ruiz
M.
,
Giudicelli
V.
,
Foulquier
E.
,
Truong
L.
,
Thouvenin-Contet
V.
,
Lefranc
G.
.
2003
.
IMGT unique numbering for immunoglobulin and T cell receptor variable domains and Ig superfamily V-like domains.
Dev. Comp. Immunol.
27
:
55
77
.
11
Reantragoon
R.
,
Corbett
A. J.
,
Sakala
I. G.
,
Gherardin
N. A.
,
Furness
J. B.
,
Chen
Z.
,
Eckle
S. B.
,
Uldrich
A. P.
,
Birkinshaw
R. W.
,
Patel
O.
, et al
.
2013
.
Antigen-loaded MR1 tetramers define T cell receptor heterogeneity in mucosal-associated invariant T cells.
J. Exp. Med.
210
:
2305
2320
.
12
Van Rhijn
I.
,
Kasmar
A.
,
de Jong
A.
,
Gras
S.
,
Bhati
M.
,
Doorenspleet
M. E.
,
de Vries
N.
,
Godfrey
D. I.
,
Altman
J. D.
,
de Jager
W.
, et al
.
2013
.
A conserved human T cell population targets mycobacterial antigens presented by CD1b.
Nat. Immunol.
14
:
706
713
.
13
Sieling
P. A.
,
Chatterjee
D.
,
Porcelli
S. A.
,
Prigozy
T. I.
,
Mazzaccaro
R. J.
,
Soriano
T.
,
Bloom
B. R.
,
Brenner
M. B.
,
Kronenberg
M.
,
Brennan
P. J.
, et al
.
1995
.
CD1-restricted T cell recognition of microbial lipoglycan antigens.
Science
269
:
227
230
.
14
Sieling
P. A.
,
Ochoa
M. T.
,
Jullien
D.
,
Leslie
D. S.
,
Sabet
S.
,
Rosat
J. P.
,
Burdick
A. E.
,
Rea
T. H.
,
Brenner
M. B.
,
Porcelli
S. A.
,
Modlin
R. L.
.
2000
.
Evidence for human CD4+ T cells in the CD1-restricted repertoire: derivation of mycobacteria-reactive T cells from leprosy lesions.
J. Immunol.
164
:
4790
4796
.
15
Lantz
O.
,
Bendelac
A.
.
1994
.
An invariant T cell receptor alpha chain is used by a unique subset of major histocompatibility complex class I-specific CD4+ and CD4-8- T cells in mice and humans.
J. Exp. Med.
180
:
1097
1106
.
16
Tilloy
F.
,
Treiner
E.
,
Park
S. H.
,
Garcia
C.
,
Lemonnier
F.
,
de la Salle
H.
,
Bendelac
A.
,
Bonneville
M.
,
Lantz
O.
.
1999
.
An invariant T cell receptor alpha chain defines a novel TAP-independent major histocompatibility complex class Ib-restricted alpha/beta T cell subpopulation in mammals.
J. Exp. Med.
189
:
1907
1921
.
17
Gold
M. C.
,
Cerri
S.
,
Smyk-Pearson
S.
,
Cansler
M. E.
,
Vogt
T. M.
,
Delepine
J.
,
Winata
E.
,
Swarbrick
G. M.
,
Chua
W. J.
,
Yu
Y. Y.
, et al
.
2010
.
Human mucosal associated invariant T cells detect bacterially infected cells.
PLoS Biol.
8
:
e1000407
.
18
Lepore
M.
,
Kalinichenko
A.
,
Colone
A.
,
Paleja
B.
,
Singhal
A.
,
Tschumi
A.
,
Lee
B.
,
Poidinger
M.
,
Zolezzi
F.
,
Quagliata
L.
, et al
.
2014
.
Parallel T-cell cloning and deep sequencing of human MAIT cells reveal stable oligoclonal TCRβ repertoire.
Nat. Commun.
5
:
3866
.
19
Legg
K.
2014
.
Cytokines: tipping TB off balance.
Nat. Rev. Immunol.
14
:
516
517
.
20
Greenaway
H. Y.
,
Ng
B.
,
Price
D. A.
,
Douek
D. C.
,
Davenport
M. P.
,
Venturi
V.
.
2013
.
NKT and MAIT invariant TCRalpha sequences can be produced efficiently by VJ gene recombination.
Immunobiology
218: 213–224.
21
Venturi
V.
,
Price
D. A.
,
Douek
D. C.
,
Davenport
M. P.
.
2008
.
The molecular basis for public T-cell responses?
Nat. Rev. Immunol.
8
:
231
238
.
22
Li
H.
,
Ye
C.
,
Ji
G.
,
Wu
X.
,
Xiang
Z.
,
Li
Y.
,
Cao
Y.
,
Liu
X.
,
Douek
D. C.
,
Price
D. A.
,
Han
J.
.
2012
.
Recombinatorial biases and convergent recombination determine interindividual TCRβ sharing in murine thymocytes.
J. Immunol.
189
:
2404
2413
.
23
Matsuda
J. L.
,
Gapin
L.
,
Fazilleau
N.
,
Warren
K.
,
Naidenko
O. V.
,
Kronenberg
M.
.
2001
.
Natural killer T cells reactive to a single glycolipid exhibit a highly diverse T cell receptor beta repertoire and small clone size.
Proc. Natl. Acad. Sci. USA
98
:
12636
12641
.
24
Martin
E.
,
Treiner
E.
,
Duban
L.
,
Guerri
L.
,
Laude
H.
,
Toly
C.
,
Premel
V.
,
Devys
A.
,
Moura
I. C.
,
Tilloy
F.
, et al
.
2009
.
Stepwise development of MAIT cells in mouse and human.
PLoS Biol.
7
:
e54
.
25
Gold
M. C.
,
Eid
T.
,
Smyk-Pearson
S.
,
Eberling
Y.
,
Swarbrick
G. M.
,
Langley
S. M.
,
Streeter
P. R.
,
Lewinsohn
D. A.
,
Lewinsohn
D. M.
.
2013
.
Human thymic MR1-restricted MAIT cells are innate pathogen-reactive effectors that adapt following thymic egress.
Mucosal Immunol.
6
:
35
44
.
26
Treiner
E.
,
Duban
L.
,
Bahram
S.
,
Radosavljevic
M.
,
Wanner
V.
,
Tilloy
F.
,
Affaticati
P.
,
Gilfillan
S.
,
Lantz
O.
.
2003
.
Selection of evolutionarily conserved mucosal-associated invariant T cells by MR1.
Nature
422
:
164
169
.
27
Godfrey
D. I.
,
MacDonald
H. R.
,
Kronenberg
M.
,
Smyth
M. J.
,
Van Kaer
L.
.
2004
.
NKT cells: what’s in a name?
Nat. Rev. Immunol.
4
:
231
237
.
28
Huang
C.
,
Kanagawa
O.
.
2001
.
Ordered and coordinated rearrangement of the TCR alpha locus: role of secondary rearrangement in thymic selection.
J. Immunol.
166
:
2597
2601
.
29
Roth
M. E.
,
Holman
P. O.
,
Kranz
D. M.
.
1991
.
Nonrandom use of J alpha gene segments: influence of V alpha and J alpha gene location.
J. Immunol.
147
:
1075
1081
.
30
Shimamura
M.
,
Miura-Ohnuma
J.
,
Huang
Y. Y.
.
2001
.
Major sites for the differentiation of V alpha 14(+) NKT cells inferred from the V-J junctional sequences of the invariant T-cell receptor alpha chain.
Eur. J. Biochem.
268
:
56
61
.

The authors have no financial conflicts of interest.

Supplementary data