Abstract
The CD8+ TCR repertoires specific for many immunogenic epitopes of CMV and EBV are dominated by a few TCR clonotypes and involve public TCRs that are shared between many MHC-matched individuals. In previous studies, we demonstrated that the observed sharing of epitope-specific TCRβ chains between individuals is strongly associated with TCRβ production frequency, and that a process of convergent recombination facilitates the more efficient production of some TCRβ sequences. In this study, we analyzed a total of 2836 TCRβ sequences from 23 CMV-infected and 10 EBV-infected individuals to investigate the factors that influence the sharing of TCRβ sequences in the CD8+ T cell responses to two immunodominant HLA-A*0201-restricted epitopes from these viruses. The most shared TCRβ amino acid sequences were found to have two features that indicate efficient TCRβ production, as follows: 1) they required fewer nucleotide additions, and 2) they were encoded by a greater variety of nucleotide sequences. We used simulations of random V(D)J recombination to demonstrate that the in silico TCRβ production frequency was predictive of the extent to which both TCRβ nucleotide and amino acid sequences were shared in vivo. These results suggest that TCRβ production frequency plays an important role in the interindividual sharing of TCRβ sequences within CD8+ T cell responses specific for CMV and EBV.
T cell recognition of a large variety of antigenic peptides bound to different MHC molecules relies on a diverse repertoire of TCRs. TCR α- and β-chains are produced by recombination of germline V, D (for β-chains only), and J genes. Nontemplate nucleotides added in the junctions between truncated germline genes also contribute to the generation of a diverse TCR repertoire. Overall, this process has the potential to generate an enormous diversity (>1015) (1) of TCRαβs in the thymus, although only a small proportion of these survives thymic selection. The human body accommodates ∼1012 T cells in the periphery at any given time, consisting of ∼107 different TCRαβs (2).
The two herpes viruses, CMV and EBV, infect the majority of the population. Acute CMV and EBV infections are characterized by massive proliferation of epitope-specific CD8+ T cells (3, 4), which largely control primary viral replication (5, 6). Following acute infection, the virus-specific CD8+ T cell population contracts and a memory T cell population is formed. However, these viruses are not completely cleared and persist within infected individuals, with myeloid progenitor cells and B cells providing reservoirs for latent CMV and EBV, respectively. Some studies have suggested that senescence of highly expanded virus-specific CD8+ T cells plays an important role in the contraction phase (4, 7, 8). However, it has also been suggested that some epitope-specific CD8+ T cells observed in the response to primary infection are maintained through the persistent phase of infection (9, 10).
In immunocompetent individuals, CMV and EBV infections are largely asymptomatic. In individuals infected later in life, EBV may present in acute infection as infectious mononucleosis. However, both CMV and EBV can be serious opportunistic infections in individuals whose immune defenses are compromised by the immaturity of the neonatal immune system, other infections such as HIV, or posttransplantation immunosuppression. CMV and EBV also have a substantial impact upon the health of the aged population. Large oligoclonal expansions of CMV- or EBV-specific CD8+ T cells can, over time, lead to a substantial proportion of the total repertoire being occupied by T cells specific for these viruses (11, 12, 13, 14). This predominance of CMV- or EBV-specific CD8+ T cells can lead to an underrepresentation of T cells with other specificities, and can thus compromise the ability of the T cell repertoire in aged individuals to fight other infections (15).
Public TCR sequences (i.e., identical TCR amino acid sequences observed in many MHC-matched individuals) have been observed in many studies of the CD8+ T cell responses to specific CMV epitopes (9, 11, 16, 17, 18, 19, 20, 21) and EBV epitopes (22, 23, 24, 25, 26, 27, 28, 29). Such public TCRs are unexpected, because the potential diversity of the thymic TCR repertoire (>1015) (1) greatly exceeds the estimated diversity of the peripheral TCR repertoire (∼107) (2). The observed sharing of CMV and EBV epitope-specific TCRβ between individuals is even more surprising in the earlier studies that involved only a few donors and few sequences per donor. It has been suggested that public TCRs play a role in the focusing of these virus-specific TCR repertoires over time (9). To better understand Ag-driven selection of the focused TCR repertoires that respond to specific CMV or EBV epitopes, many structural studies of peptide-MHC complexes and/or TCRs have been performed (26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36). These studies suggest that the shape of the peptide-MHC complex (27, 28, 29, 33, 36), the ability of the TCR to undergo binding-induced structural changes (26, 30, 32), and the ability of the TCR to adjust the peptide-MHC shape on interaction (36) are possible factors in the selection of the restricted TCR repertoires observed responding to specific CMV or EBV epitopes.
We have previously studied the sharing of TCRβ sequences in CD8+ T cell responses to influenza virus in mice (37) and SIV in rhesus macaques (38). We demonstrated that the efficiency with which TCRβ sequences are produced via V(D)J recombination is an important determinant of the extent of TCRβ sharing between individuals (37, 39). We showed that TCRβ amino acid sequences found in more individuals required fewer nucleotide additions and were encoded by a greater variety of nucleotide sequences. Both of these features are good identifiers of TCRβ sequences that have the potential to be produced frequently (37, 38, 39, 40) and are also observed characteristics of many public TCRs (19, 20, 22, 25, 37, 38, 40). Moreover, these features suggest that a process of convergent recombination drives variable TCR production frequencies. Convergent recombination encompasses the production of a nucleotide sequence by a variety of recombination mechanisms (i.e., different germline gene contributions and nucleotide additions) and the production of an amino acid sequence by a variety of nucleotide sequences. It also accounts for the frequent occurrence of some V(D)J recombination events due to the involvement of fewer nucleotide additions (37, 38, 40).
In the present study, we investigate the sharing of TCRβ sequences in HLA-A*0201-restricted CD8+ T cell responses to the CMV phosphoprotein pp65495–503 epitope (NLVPMVATV; referred to hereafter as CMV-NV9) and the EBV BMLF1259–267 epitope (GLCTLVAML; referred to hereafter as EBV-GL9). Both the CMV-NV9 and EBV-GL9 epitopes elicit strong peripheral CD8+ T cell responses during the chronic phases of their respective infections (reviewed in Refs. 41 and 42). Public TCRβ sequences have been observed previously in both the CMV-NV9-specific (11, 16, 18, 19) and EBV-GL9-specific (19, 24, 25) CD8+ T cell responses. Thus, these epitope-specific TCRβ repertoires are good candidates for studying the factors that determine the sharing of TCRs between individuals.
Materials and Methods
TCR repertoire data
The TCRβ repertoire data for CD8+ T cell responses to the CMV-NV9 and EBV-GL9 epitopes in HLA-A*0201+ individuals consist of published (19) and unpublished sequences obtained from both healthy and HIV-infected donors. To obtain these TCRβ sequences, Ag-specific CD8+ T cells were identified using peptide-MHC class I tetramers and sorted by flow cytometry to purity levels >98%. All expressed TCRB gene products were then amplified without bias using a strand-switch RT-PCR, subcloned, and sequenced, as described in detail in the original publication (19). To ensure a consistent approach to the analysis of published and unpublished sequences, the data from the published study (19) were reanalyzed in parallel with the new sequences (resulting in some minor differences between this and the published TCRβ repertoires (19)).
Human germline genes
In this study, we used the reference sequences (i.e., the *01 alleles) for the human germline Vβ, Dβ, and Jβ genes obtained from the international ImMunoGeneTics information system (http://imgt.cines.fr/home.html) (43). ImMunoGeneTics nomenclature is used when referring to all TCR genes.
Identification of the germline Vβ and Jβ genes for each TCRβ sequence
The germline Vβ and Jβ genes involved in the production of each TCRβ sequence were identified as the highest percentage-matched gene to align with the TCRβ sequence. Due to complete identity within the CDR3 regions and the overall homology between the TRBV12-3 and TRBV12-4 genes (97.7% of 347 bp), and the TRBV6-2 and TRBV6-3 genes (99.7% of 344 bp), it was often difficult to distinguish the gene within each pair that was used by the TCRβ sequences. Thus, for the purpose of this study, we attributed the Vβ gene usage of TCRβ sequences using the TRBV12-3 and TRBV12-4 genes to only the TRBV12-4 gene. Similarly, we attributed the gene usage of the TRBV6-2 and TRBV6-3 genes to only the TRBV6-3 gene. The inability to distinguish between usage of these genes does not affect the analysis of the V(D)J recombination mechanisms due to their identity within CDR3.
Analysis of the sharing of TCRβ sequences
TCRβ sequences were considered shared when they were observed in an epitope-specific response in more than one individual. The identity of TCRβ sequences between individuals required an identical CDR3β sequence together with identical Vβ and Jβ gene usage. The number of individuals in which epitope-specific TCRβ sequences were found provided a relative measure of the sharing of epitope-specific TCRβ sequences observed in the individuals considered in this study.
Estimating the number of nucleotide additions
The minimum number of nucleotide additions required to produce a TCRβ sequence was determined by first aligning the germline Vβ gene at the 5′ end of the TCRβ sequence and then the Jβ gene at the 3′ end of the TCRβ sequence. The germline Dβ genes were subsequently aligned with nucleotides in the junction between the identified Vβ and Jβ regions. No less than 2 nt were attributed to a Dβ gene segment. It was also assumed that only the TRBD1 gene was involved in gene recombinations with the TRBJ1 group of genes. Nucleotides in the junctions between the identified Vβ, Dβ, and Jβ gene segments were considered to be nucleotide additions.
Simulation of V(D)J recombination
The V(D)J recombination process was simulated for TCRβ sequences using the TRBV12-4/TRBJ1-2 and TRBV20-1/TRBJ1-2 gene combinations to estimate the relative production frequencies, in the absence of recombination bias, of observed TCRβ sequences. The recombination of multiple Vβ and Jβ genes was not simulated owing to insufficient information about biases in the Vβ and Jβ pairing process. The simulated V(D)J recombination process involved a randomly determined number, between 0 and 12, of nucleotide deletions from both the 3′ end of the germline Vβ gene and the 5′ end of the germline Jβ gene. Nucleotide deletions from both the 5′ and 3′ ends of the germline Dβ gene segment were allowed, up to the full length of the Dβ gene. It was assumed that only the TRBD1 gene was involved in the production of TCRβ sequences using genes from the TRBJ1 gene group. The number of nucleotide additions and their nucleotide bases were also randomly chosen, with up to 12 nt in total added between the germline Vβ and Dβ gene segments and Dβ and Jβ gene segments. The number of nucleotides removed from both the Vβ and Jβ genes and the number of nucleotides added in the Vβ-Jβ junction, which were determined from the alignments of the observed CMV-NV9-specific TRBV12-4/TRBJ1-2 and EBV-GL9-specific TRBV20-1/TRBJ1-2 TCRβ sequences with the germline genes, were used as a guide in choosing the number of nucleotide additions and deletions used in the simulations. Although the TCRβ sequences observed experimentally in many individuals had a bias toward fewer nucleotide additions, we allowed an equal probability of adding from 0 to 12 nt, thus effectively biasing TCRβ production against convergence. The simulations of the V(D)J recombination process were performed using Matlab 7.0.1 (The MathWorks). More in-depth detail about the simulation process is provided in a previous paper (37).
Analysis of the relationship between in silico TCR production and in vivo TCR sharing
Some of the TCRβ sequences observed in vivo consisted of deviations (i.e., most likely allelic differences or sequencing errors) from the reference germline TCR genes that were outside the junction between the Vβ and Jβ genes. A small subset of these TCRβ sequences (i.e., 2 of the 22 unique EBV-GL9-specific TRBV20-1/TRBJ1-2 sequences, which were both unshared) could not be made with the parameter range considered in the simulations and were thus excluded from this analysis.
Statistical analysis
Statistical analyses were performed using GraphPad Prism software (GraphPad). The reported trends between the sharing of TCR sequences and both the sequence alignment data and simulation data were assessed using Spearman rank correlations.
Results
Sharing in the CMV-NV9-specific and EBV-GL9-specific TCRβ repertoires
We have investigated the sharing of TCR β-chain sequences between individuals in both the HLA-A*0201-restricted CMV-NV9-specific and EBV-GL9-specific CD8+ T cell responses. The TCRβ repertoire data used in this study consisted of a total of 2082 CMV-NV9-specific TCRβ sequences obtained from 23 individuals and 754 EBV-GL9-specific TCRβ sequences sampled from 10 individuals. Although clonal dominance (i.e., the number of copies of a TCRβ sequence) and TCRβ sharing are not mutually exclusive (because the dominance of a TCRβ affects the probability that it will be sampled), there are many factors that contribute to the clonal dominance of a TCRβ sequence that do not play a role in TCRβ sharing. Furthermore, there does not appear to be a direct association between TCRβ sharing and clonal dominance in the CMV-NV9-specific and EBV-GL9-specific CD8+ T cell responses, because the shared epitope-specific TCRβ sequences were often subdominant in the individuals in which they were observed (19). Thus, to dissociate these two features, we consider only the presence of unique TCRβ sequences in different individuals, regardless of their clonal dominance. The characteristics of the CMV-NV9-specific and EBV-GL9-specific TCRβ repertoires are summarized in Table I.
The characteristics of the CMV-NV9-specific and EBV-GL9-specific CD8+ TCRβ repertoires
. | CMV-NV9 . | EBV-GL9 . |
---|---|---|
No. of individuals | 23 | 10 |
No. of TCRβ sequences | 2082 | 754 |
Mean no. of TCRβ sequences per individual | 90.5 | 75.4 |
Range of no. of TCRβ sequences per individual | 28–203 | 44–122 |
TCRβ a.a.a sequences | ||
Mean no. of different a.a. sequences per individual | 8.3 | 9.4 |
Range of no. of different a.a. sequences per individual | 2–16 | 3–18 |
No. of different a.a. sequences across all individuals | 170 | 85 |
No. of sharedb a.a. sequences across all individuals | 8 | 6 |
Max. no. of individuals sharing an a.a. sequence | 8 | 5 |
Max. no. of n.t.c sequences encoding an a.a. sequence: | ||
-in a single individual | 4 | 4 |
-across all individuals | 6 | 5 |
TCRβ nucleotide sequences | ||
No. of different n.t. sequences across all individuals | 210 | 100 |
No. of shared n.t. sequences across all individuals | 4 | 1 |
Max. no. of individuals sharing a n.t. sequence | 3 | 2 |
. | CMV-NV9 . | EBV-GL9 . |
---|---|---|
No. of individuals | 23 | 10 |
No. of TCRβ sequences | 2082 | 754 |
Mean no. of TCRβ sequences per individual | 90.5 | 75.4 |
Range of no. of TCRβ sequences per individual | 28–203 | 44–122 |
TCRβ a.a.a sequences | ||
Mean no. of different a.a. sequences per individual | 8.3 | 9.4 |
Range of no. of different a.a. sequences per individual | 2–16 | 3–18 |
No. of different a.a. sequences across all individuals | 170 | 85 |
No. of sharedb a.a. sequences across all individuals | 8 | 6 |
Max. no. of individuals sharing an a.a. sequence | 8 | 5 |
Max. no. of n.t.c sequences encoding an a.a. sequence: | ||
-in a single individual | 4 | 4 |
-across all individuals | 6 | 5 |
TCRβ nucleotide sequences | ||
No. of different n.t. sequences across all individuals | 210 | 100 |
No. of shared n.t. sequences across all individuals | 4 | 1 |
Max. no. of individuals sharing a n.t. sequence | 3 | 2 |
a.a., Amino acid.
Shared, present in at least two individuals.
n.t., Nucleotide.
The CMV-NV9-specific TCRβ repertoire, pooled across all individuals, consisted of 170 different TCRβ amino acid sequences (Supplemental Table S1),3 4.7% of which were present in more than one individual. The most shared CMV-NV9-specific TCRβ amino acid sequence was CASSSANYGYT, which used the TRBV12-4 and TRBJ1-2 genes. This TCRβ sequence was found in 8 of the 23 individuals involved in this study, and it has also been observed in an additional 3 individuals in other published studies (16, 18) (Table II). In addition to the sharing of TCRβ amino acid sequences, sharing of TCRβ nucleotide sequences was also observed. Four CMV-NV9-specific TCRβ nucleotide sequences were also found in more than one individual, with two TCRβ nucleotide sequences shared by two individuals and another two sequences shared by three individuals.
The CD8+ TCRβ amino acid sequences observed responding to the CMV-NV9 and EBV-GL9 epitopes in more than one individuala
Virus-Epitope . | TRBV . | CDR3β a.a.b Sequence . | TRBJ . | No. of Individuals Sharingc TCRβ a.a. Sequenced . | No. of Individuals Sharing TCRβ a.a. Sequence in Other Published Studies . |
---|---|---|---|---|---|
CMV-NV9 | 12-4 | CASSSANYGYT | 1-2 | 8 | 2 (Ref. 16 ), 1 (Ref. 18 )e |
7-6 | CASSLAPGATNEKLF | 1-4 | 5 | 1 (Ref. 18 ) | |
12-4 | CASSSAHYGYT | 1-2 | 4 | 1 (Ref. 18 )e | |
27 | CASSLEGYTEAF | 1-1 | 3 | 3 (Ref. 18 ) | |
12-4 | CASSSAYYGYT | 1-2 | 2 | ||
12-4 | CASSLVGGRYGYT | 1-2 | 2 | 1 (Ref. 11 )e | |
12-4 | CASSVVNEQF | 2-1 | 2 | ||
27 | CASSLTSGSPYNEQF | 2-1 | 2 | ||
28 | CASSFQGYTEAF | 1-1 | 1 | 1 (Ref. 18 ), 1 (Ref. 9 ) | |
EBV-GL9 | 20-1 | CSARDGTGNGYT | 1-2 | 5 | 3 (Ref. 25 ) |
20-1 | CSARDRTGNGYT | 1-2 | 2 | 3 (Ref. 25 ) | |
20-1 | CSARDRVGNTIY | 1-3 | 2 | 2 (Ref. 25 ) | |
20-1 | CSARVGVGNTIY | 1-3 | 2 | ||
29-1 | CSVGTGGTNEKLF | 1-4 | 2 | 5 (Ref. 25 ) | |
29-1 | CSSQEGGYGYT | 1-2 | 2 | 1 (Ref. 24 ) | |
20-1 | CSARDQTGNGYT | 1-2 | 1 | 1 (Ref. 25 ) | |
20-1 | CSARDRIGNGYT | 1-2 | 1 | 1 (Ref. 25 ) | |
20-1 | CSARIGVGNTIY | 1-3 | 1 | 1 (Ref. 25 ) | |
20-1 | CSARSGVGNTIY | 1-3 | 1 | 2 (Ref. 25 ) | |
2 | CASSEGRVSPGELF | 2-2 | 1 | 1 (Ref. 25 ) |
Virus-Epitope . | TRBV . | CDR3β a.a.b Sequence . | TRBJ . | No. of Individuals Sharingc TCRβ a.a. Sequenced . | No. of Individuals Sharing TCRβ a.a. Sequence in Other Published Studies . |
---|---|---|---|---|---|
CMV-NV9 | 12-4 | CASSSANYGYT | 1-2 | 8 | 2 (Ref. 16 ), 1 (Ref. 18 )e |
7-6 | CASSLAPGATNEKLF | 1-4 | 5 | 1 (Ref. 18 ) | |
12-4 | CASSSAHYGYT | 1-2 | 4 | 1 (Ref. 18 )e | |
27 | CASSLEGYTEAF | 1-1 | 3 | 3 (Ref. 18 ) | |
12-4 | CASSSAYYGYT | 1-2 | 2 | ||
12-4 | CASSLVGGRYGYT | 1-2 | 2 | 1 (Ref. 11 )e | |
12-4 | CASSVVNEQF | 2-1 | 2 | ||
27 | CASSLTSGSPYNEQF | 2-1 | 2 | ||
28 | CASSFQGYTEAF | 1-1 | 1 | 1 (Ref. 18 ), 1 (Ref. 9 ) | |
EBV-GL9 | 20-1 | CSARDGTGNGYT | 1-2 | 5 | 3 (Ref. 25 ) |
20-1 | CSARDRTGNGYT | 1-2 | 2 | 3 (Ref. 25 ) | |
20-1 | CSARDRVGNTIY | 1-3 | 2 | 2 (Ref. 25 ) | |
20-1 | CSARVGVGNTIY | 1-3 | 2 | ||
29-1 | CSVGTGGTNEKLF | 1-4 | 2 | 5 (Ref. 25 ) | |
29-1 | CSSQEGGYGYT | 1-2 | 2 | 1 (Ref. 24 ) | |
20-1 | CSARDQTGNGYT | 1-2 | 1 | 1 (Ref. 25 ) | |
20-1 | CSARDRIGNGYT | 1-2 | 1 | 1 (Ref. 25 ) | |
20-1 | CSARIGVGNTIY | 1-3 | 1 | 1 (Ref. 25 ) | |
20-1 | CSARSGVGNTIY | 1-3 | 1 | 2 (Ref. 25 ) | |
2 | CASSEGRVSPGELF | 2-2 | 1 | 1 (Ref. 25 ) |
This table includes the TCRβ sequences observed in this study that were either shared between multiple individuals in this study or also observed in another individual in a previously published study.
a.a., Amino acid.
Sharing, present in at least two individuals.
The present study includes TCRβ amino acid sequences obtained in a previous study (Ref. 19 ) as well as our more recent unpublished data.
This study identified these TCRβ sequences as using the TRBV12-3 (or Vβ8S1 in Arden nomenclature; Ref. 50 ) gene. However, due to the difficulty of distinguishing between the TRBV12-3 and TRBV12-4 genes, we have assumed that these TCRβ sequences could have also used the TRBV12-4 gene.
The EBV-GL9-specific TCRβ repertoire consisted of 85 different TCRβ amino acid sequences obtained from a total of 10 individuals (Supplemental Table S2).3 Of these unique TCRβ amino acid sequences, 7.1% were shared between individuals. The most shared EBV-GL9-specific TCRβ amino acid sequence, CSARDGTGNGYT, used the TRBV20-1 and TRBJ1-2 genes. This TCRβ sequence was observed in 5 of the 11 individuals in this study, and it has previously been observed in three other individuals (25) (Table II). Although there was a high degree of sharing of TCRβ amino acid sequences, there was relatively little sharing of TCRβ nucleotide sequences. Only one EBV-GL9-specific TCRβ nucleotide sequence was observed responding in more than one individual, and this sequence was found in two individuals.
The CD8+ T cell responses to both the CMV-NV9 and EBV-GL9 epitopes have been characterized previously as public T cell responses, with several studies reporting TCRβ sequences observed in more than one donor (11, 16, 18, 19, 24, 25). However, our much larger study (i.e., involving more TCRβ sequences and more individuals) reveals a spectrum in the number of individuals sharing CMV-NV9-specific and EBV-GL9-specific TCRβ amino acid sequences, rather than the dichotomy of public vs private TCRs that is often reported. In the following sections, we investigate the influence of TCRβ production frequency on the sharing of TCRβ sequences in these epitope-specific responses. We study the potential mechanisms by which these TCRβ sequences can be produced. In particular, we focus on both the germline encoding of the TCRβ sequences and convergent mechanisms that could impact upon the prevalence of the TCRβ sequences in the thymus and, thus, the naive TCR repertoire.
Shared CMV-NV9-specific and EBV-GL9-specific TCRβ sequences require fewer nucleotide additions
Although the recombination of various germline V, D, and J genes can produce a variety of different TCR sequences, it is the addition of nucleotides in the junctions between the truncated gene segments that substantially enhances TCR diversity. In contrast, a TCR sequence that requires fewer nucleotide additions involves less of a random element and is thus much easier to reproduce via the same gene recombination mechanism (usually involving a minimal number of nucleotide additions) or a variety of recombination mechanisms (i.e., different germline gene contributions and nucleotide additions, which may involve the deletion of nucleotides from the ends of the gene segments and the addition of nucleotides that mimic these deleted nucleotides) (37, 38, 40). The minimum number of random nucleotide additions required to produce a TCRβ sequence is therefore one predictive feature of a TCRβ nucleotide sequence that has the potential to be produced efficiently, and thus be present in a larger number of individuals. Thus, a relationship between TCR production frequency and the observed spectrum of sharing of TCR sequences should be reflected by a tendency for TCR sequences that are observed in more individuals to require fewer nucleotide additions.
We investigated potential V(D)J recombination events for each of the observed CMV-NV9-specific and EBV-GL9-specific TCRβ nucleotide sequences using a basic algorithm to align the germline genes with the TCRβ sequence. This algorithm determined one V(D)J recombination mechanism, from many potential mechanisms, that could have produced the TCRβ sequence. The determined V(D)J recombination mechanism involved a maximal contribution by the germline genes, and thus a minimal contribution of nucleotide additions. Germline Vβ, Jβ, and Dβ genes were aligned sequentially with the TCRβ sequence, and any nucleotides found in the junctions between the identified contributions from gene segments were considered to be nucleotide additions. An example of alignments of CMV-NV9-specific and EBV-GL9-specific TCRβ sequences with the germline genes is shown in Fig. 1.
An illustration of the role of convergent recombination in enhancing the potential production efficiency of CMV-NV9-specific and EBV-GL9-specific CD8+ TCRβ sequences. The most shared CMV-NV9-specific TCRβ sequence, CASSSANYGYT, and the shared EBV-GL9-specific TCRβ sequences, CSARDGTGNGYT and CSARDRVGNTIY, are used to demonstrate how convergent recombination allows some TCRβ sequences to be produced more frequently than others. Some epitope-specific TCRβ amino acid sequences are found to be encoded by multiple nucleotide sequences (A). In this figure, we show one possible alignment of the TCRβ sequences with the TRBV12-4 or TRBV20-1 (blue), TRBD1 (red), and TRBJ1-2 or TRBJ1-3 (green) genes. This alignment represents one V(D)J recombination mechanism that required a minimum number of nucleotide additions (black). However, these TCRβ sequences could potentially have been made by many different recombination events involving different contributions of germline gene segments and different nucleotide additions. The production of some nucleotide sequences can involve many different V(D)J recombination mechanisms that each require few nucleotide additions. In this case, each recombination event also has the potential to occur frequently, owing to the requirement for fewer nucleotide additions. We use one of the shared nucleotide sequences that encodes the most shared CMV-NV9-specific amino acid sequence, CASSSANYGYT, to demonstrate the variety of recombination mechanisms that can produce a single, fully germline-encoded nucleotide sequence (B, yellow box). We use another nucleotide sequence to demonstrate how an overlap between two gene segments (in this case, the TRBV20-1 and TRBD1 genes) can increase the number of different ways that the nucleotide sequence can be produced (B, brown box). This particular nucleotide sequence encodes the EBV-GL9-specific TCRβ amino acid sequence, CSARDRVGNTIY, that was observed in two individuals in this study and in two other individuals in a previously published study (25 ) (Table II). These many recombination mechanisms across the V(D)J junction involve no more than 2 nt additions. The germline genes used in this analysis are provided in C.
An illustration of the role of convergent recombination in enhancing the potential production efficiency of CMV-NV9-specific and EBV-GL9-specific CD8+ TCRβ sequences. The most shared CMV-NV9-specific TCRβ sequence, CASSSANYGYT, and the shared EBV-GL9-specific TCRβ sequences, CSARDGTGNGYT and CSARDRVGNTIY, are used to demonstrate how convergent recombination allows some TCRβ sequences to be produced more frequently than others. Some epitope-specific TCRβ amino acid sequences are found to be encoded by multiple nucleotide sequences (A). In this figure, we show one possible alignment of the TCRβ sequences with the TRBV12-4 or TRBV20-1 (blue), TRBD1 (red), and TRBJ1-2 or TRBJ1-3 (green) genes. This alignment represents one V(D)J recombination mechanism that required a minimum number of nucleotide additions (black). However, these TCRβ sequences could potentially have been made by many different recombination events involving different contributions of germline gene segments and different nucleotide additions. The production of some nucleotide sequences can involve many different V(D)J recombination mechanisms that each require few nucleotide additions. In this case, each recombination event also has the potential to occur frequently, owing to the requirement for fewer nucleotide additions. We use one of the shared nucleotide sequences that encodes the most shared CMV-NV9-specific amino acid sequence, CASSSANYGYT, to demonstrate the variety of recombination mechanisms that can produce a single, fully germline-encoded nucleotide sequence (B, yellow box). We use another nucleotide sequence to demonstrate how an overlap between two gene segments (in this case, the TRBV20-1 and TRBD1 genes) can increase the number of different ways that the nucleotide sequence can be produced (B, brown box). This particular nucleotide sequence encodes the EBV-GL9-specific TCRβ amino acid sequence, CSARDRVGNTIY, that was observed in two individuals in this study and in two other individuals in a previously published study (25 ) (Table II). These many recombination mechanisms across the V(D)J junction involve no more than 2 nt additions. The germline genes used in this analysis are provided in C.
Our investigation of the relationship between the extent of sharing of the TCRβ amino acid sequences between individuals and the minimum number of nucleotide additions required to produce the nucleotide sequences encoding these TCRβ amino acid sequences revealed significant negative correlations (CMV-NV9, r = −0.44, p < 0.0001; EBV-GL9, r = −0.31, p = 0.0017; Spearman; Fig. 2, A and B). The minimum number of nucleotide additions involved in TCRβ sequence production was also found to be significantly and negatively correlated with the sharing of CMV-NV9-specific TCRβ nucleotide sequences (r = −0.19, p = 0.0047; Spearman; Fig. 3). Thus, shared TCRβ amino acid sequences required, on average, fewer nucleotide additions than unshared sequences. However, some of the most shared TCRβ amino acid sequences still required several nucleotide additions. For example, the most shared EBV-GL9-specific sequence, CSARDGTGNGYT, could not be made with less than 2 nt additions (Fig. 2 B).
Analysis of the CMV-NV9-specific and EBV-GL9-specific TCRβ amino acid sequence repertoires. The relationship between the minimum numbers of nucleotide additions required to produce CMV-NV9-specific (A) and EBV-GL9-specific (B) TCRβ sequences and the number of individuals in which the epitope-specific TCRβ amino acid (a.a.) sequences was observed. We also show the relationship between the number of different nucleotide (n.t.) sequences observed encoding the CMV-NL9-specific (C) and EBV-GL9-specific (D) TCRβ amino acid sequences across the pooled repertoires of all individuals, and the extent of sharing of the epitope-specific TCRβ amino acid sequences. The most shared CMV-NL9-specific TCRβ amino acid sequence, found in eight individuals, was CASSSANYGYT, which used the TRBV12-4 and TRBJ1-2 genes (A and C). The most shared EBV-GL9-specific TCRβ amino acid sequence, found in five individuals, was CSARDGTGNGYT, which used the TRBV20-1 and TRBJ1-2 genes (B and D). The opacity of the circles indicates the density of points plotted, and the solid horizontal lines represent the medians of the estimated minimum number of nucleotide additions or the number of unique nucleotide sequences encoding an amino acid sequence. The correlation and significance values are based on the Spearman test.
Analysis of the CMV-NV9-specific and EBV-GL9-specific TCRβ amino acid sequence repertoires. The relationship between the minimum numbers of nucleotide additions required to produce CMV-NV9-specific (A) and EBV-GL9-specific (B) TCRβ sequences and the number of individuals in which the epitope-specific TCRβ amino acid (a.a.) sequences was observed. We also show the relationship between the number of different nucleotide (n.t.) sequences observed encoding the CMV-NL9-specific (C) and EBV-GL9-specific (D) TCRβ amino acid sequences across the pooled repertoires of all individuals, and the extent of sharing of the epitope-specific TCRβ amino acid sequences. The most shared CMV-NL9-specific TCRβ amino acid sequence, found in eight individuals, was CASSSANYGYT, which used the TRBV12-4 and TRBJ1-2 genes (A and C). The most shared EBV-GL9-specific TCRβ amino acid sequence, found in five individuals, was CSARDGTGNGYT, which used the TRBV20-1 and TRBJ1-2 genes (B and D). The opacity of the circles indicates the density of points plotted, and the solid horizontal lines represent the medians of the estimated minimum number of nucleotide additions or the number of unique nucleotide sequences encoding an amino acid sequence. The correlation and significance values are based on the Spearman test.
Analysis of the germline encoding of CMV-NV9-specific TCRβ nucleotide sequences. The relationship between the minimum numbers of nucleotide additions required to produce CMV-NV9-specific TCRβ sequences and the number of individuals in which the CMV-specific TCRβ nucleotide (n.t.) sequences was observed. The two most shared CMV-NL9-specific TCRβ nucleotide sequences, found in three individuals, encoded CASSSANYGYT and used the TRBV12-4 and TRBJ1-2 genes. The opacity of the circles indicates the density of points plotted, and the solid horizontal lines represent the medians of the estimated minimum number of nucleotide additions. The correlation and significance values are based on the Spearman test.
Analysis of the germline encoding of CMV-NV9-specific TCRβ nucleotide sequences. The relationship between the minimum numbers of nucleotide additions required to produce CMV-NV9-specific TCRβ sequences and the number of individuals in which the CMV-specific TCRβ nucleotide (n.t.) sequences was observed. The two most shared CMV-NL9-specific TCRβ nucleotide sequences, found in three individuals, encoded CASSSANYGYT and used the TRBV12-4 and TRBJ1-2 genes. The opacity of the circles indicates the density of points plotted, and the solid horizontal lines represent the medians of the estimated minimum number of nucleotide additions. The correlation and significance values are based on the Spearman test.
Shared CMV-NV9-specific and EBV-GL9-specific TCRβ amino acid sequences are encoded by multiple nucleotide sequences
A standard probabilistic model predicts that a TCR amino acid sequence that is prevalent between individuals would also be prevalent within an individual (39). The observed trend for public TCRs to be encoded by many different nucleotide sequences within an individual (20, 22, 25, 37) as well as between individuals (19, 20, 22, 25, 37) suggests that public TCRs may be prevalent within the individuals in whom they are observed. It has been observed previously that public CMV-NV9-specific and EBV-GL9-specific TCRβ amino acid sequences can be encoded by multiple nucleotide sequences (19, 25). We used the larger data set considered in this study to investigate the relationship between the observed spectrum in the number of individuals sharing CMV-NV9-specific and EBV-GL9-specific TCRβ amino acid sequences and the variety of nucleotide sequences found encoding the amino acid sequences.
Our analysis of the TCRβ repertoires involved in the CD8+ T cell responses to the CMV-NV9 and EBV-GL9 epitopes revealed that many TCRβ amino acid sequences were encoded by multiple nucleotide sequences both across the pooled repertoires of all individuals and within individuals (Table I and Supplemental Tables S1 and S2).3 The number of individuals in which epitope-specific TCRβ amino acid sequences were observed was found to be positively and significantly correlated with the number of nucleotide sequences encoding the amino acid sequences across all individuals (CMV-NV9, r = 0.62, p < 0.0001; EBV-GL9, r = 0.75, p < 0.0001; Spearman; Fig. 2, C and D). These results suggest that the shared CMV-NV9-specific and EBV-GL9-specific TCRβ amino acid sequences may have been prevalent within the naive repertoires of an individual, which is supportive of a relationship between the sharing of TCRβ and the frequency with which TCRβ are produced in the thymus via gene recombination. Moreover, it suggests that the variety of different ways that these epitope-specific TCRβ sequences can be made, that is convergent recombination, is an influential factor in the observed prevalence of some TCRβ amino acid sequences.
The role of convergent recombination in the production of the CMV-NV9-specific and EBV-GL9-specific TCRβ repertoires
Our analysis shows that the sharing of CMV-NV9-specific and EBV-GL9-specific TCRβ sequences between individuals is significantly correlated with both the extent of germline encoding and the number of different nucleotide sequences encoding the TCRβ amino acid sequences. Both of these features suggest that a process of convergent recombination facilitates the production of different TCRβ sequences at different frequencies. Using the most shared CMV-NV9-specific TCRβ amino acid sequence, CASSSANYGYT, and shared EBV-GL9-specific TCRβ amino acid sequences, CSARDGTGNGYT and CSARDRVGNTIY, we illustrate the various levels at which convergent recombination contributes to the production efficiency of a TCRβ sequence (Fig. 1). A TCRβ amino acid sequence may be encoded by a variety of nucleotide sequences, and some of these nucleotide sequences may be produced many different ways by different contributions from the germline genes and nucleotide additions. We show the nucleotide sequences found encoding each of the example amino acid sequences (Fig. 1,A), and we demonstrate the variety of different ways that nucleotide sequences can be produced using one of the nucleotide sequences encoding the CMV-specific CASSSANYGYT TCRβ sequence and the EBV-GL9-specific CSARDRVGNTIY TCRβ sequence (Fig. 1 B).
Convergent recombination in an unbiased V(D)J recombination process facilitates the efficient production of shared CMV-NV9-specific and EBV-GL9-specific TCRβ sequences
The analysis reported in the previous sections suggests that the CMV-NV9-specific and EBV-GL9-specific TCRβ amino acid sequences observed in many individuals have the potential to be produced efficiently via V(D)J recombination. However, examining a TCRβ sequence does not enable us to determine the actual recombination event (i.e., contribution from each of the germline Vβ, Dβ, and Jβ genes and the nucleotide additions) by which it was produced. Moreover, it is difficult to assess the cumulative effect of fewer nucleotide additions and the variety of different ways that TCRβ nucleotide and amino acid sequences can be made on the enhancement of TCRβ amino acid sequence production frequency. We therefore used simulations to assess whether a completely random V(D)J recombination process could enable some TCRβ sequences to be produced more frequently than others, and whether the epitope-specific TCRβ sequences observed in many individuals tended to be produced efficiently by unbiased gene recombination.
We simulated the random generation of the portions of the TCRβ repertoire using the TRBV12-4 and TRBJ1-2, and TRBV20-1 and TRBJ1-2 gene combinations. These gene combinations were used by the most shared CMV-NV9-specific and EBV-GL9-specific TCRβ amino acid sequences, respectively. The simulation involved the random deletion of nucleotides from the 3′ end of the Vβ, 3′ and 5′ ends of the Dβ, and the 5′ end of the Jβ genes, and the random addition of nucleotides between the truncated Vβ and Dβ, and Dβ and Jβ gene segments. Twenty million in-frame sequences were generated for each of the in silico TRBV12-4/TRBJ1-2 and TRBV20-1/TRBJ1-2 TCRβ repertoires.
The in silico TRBV12-4/TRBJ1-2 TCRβ repertoire was used to show that the most shared CMV-NV9-specific TCRβ amino acid sequence, CASSSANYGYT, was produced much more efficiently than the average production frequency of all the observed CMV-NV9-specific TCRβ amino acid sequences using the TRBV12-4 and TRBJ1-2 genes (Fig. 4,A). The frequent production of this TCRβ amino acid sequence in the simulations was due to both a large variety of nucleotide sequences (i.e., 69 different nucleotide sequences) encoding this amino acid sequence in silico (Fig. 4,C) and the efficient in silico production of several of these nucleotide sequences by a variety of recombination mechanisms (Fig. 4,E). For example, the most efficiently produced TCRβ nucleotide sequence encoding CASSSANYGYT was made 535 times in the simulation by 63 different recombination mechanisms (Fig. 5). This particular nucleotide sequence was one of the two most shared CMV-NV9-specific TCRβ nucleotide sequences. We also investigated the relationship between in silico production frequency and the observed spectrum in the number of individuals sharing the CMV-NV9-specific TRBV12-4/TRBJ1-2 TCRβ sequences in vivo. We found a significant and positive correlation between the number of individuals found sharing TCRβ amino acid sequences in vivo and the number of times the TCRβ amino acid sequences were produced in silico (r = 0.58, p = 0.024; Spearman; Fig. 4,A). We observed a trend for the more shared CMV-NV9-specific TRBV12-4/TRBJ1-2 TCRβ amino acid sequences to be encoded by a larger variety of nucleotide sequences in the in silico repertoire (r = 0.48, p = 0.070; Spearman; Fig. 4,C). However, the in vivo sharing of CMV-NV9-specific TRBV12-4/TRBJ1-2 TCRβ amino acid sequences was more dependent on the overall variety of ways that these TCRβ sequences could be produced in the simulations (r = 0.63, p = 0.012; Spearman; Fig. 4,E). In addition to the in silico production of shared TCRβ amino acid sequences, we also examined the relationship between in silico production and the sharing of TCRβ nucleotide sequences. We found significant correlations between the extent of sharing of CMV-NV9-specific TRBV12-4/TRBJ1-2 TCRβ nucleotide sequences in vivo and both the number of times the nucleotide sequences were produced (r = 0.50, p = 0.014; Spearman; Fig. 5,A) and the number of different recombination mechanisms by which they were produced (r = 0.53, p = 0.0075; Spearman; Fig. 5 B) in the simulation.
Analysis of in silico TCRβ repertoires with respect to the in vivo sharing of CMV-NV9-specific and EBV-GL9-specific TCRβ amino acid sequences. Simulations of a random V(D)J recombination process generated TCRβ repertoires using the TRBV12-4 and TRBJ1-2 genes, and the TRBV20-1 and TRBJ1-2 genes, each consisting of 20 million in-frame TCRβ sequences. These in silico TCRβ repertoires were used to investigate the relationships between the number of individuals in which CMV-NV9-specific (A, C, and E) and EBV-GL9-specific (B, D, and F) TCRβ amino acid (a.a.) sequences were observed in vivo (horizontal axes) and the in silico generation of the TCRβ amino acid sequences (vertical axes). The number of times the TCRβ amino acid sequences were generated in silico (A and B) reflects their relative potential production efficiency via a random V(D)J recombination process. The number of different nucleotide (n.t.) sequences encoding the amino acid sequences (C and D), and the number of different V(D)J recombination mechanisms that produced the amino acid sequences in the simulation (E and F) demonstrate the role of convergent recombination in the frequency of TCRβ production. Each point on the graph represents a TCRβ amino acid sequence observed in vivo. The most shared CMV-NL9-specific TCRβ amino acid sequence, found in eight individuals, was CASSSANYGYT, which used the TRBV12-4 and TRBJ1-2 genes (A, C, and E). The most shared EBV-GL9-specific TCRβ amino acid sequence, found in five individuals, was CSARDGTGNGYT, which used the TRBV20-1 and TRBJ1-2 genes (B, D, and F). The opacity of the circles indicates the density of points plotted, and the solid horizontal lines represent the medians of the quantities displayed on the vertical axes. The dashed horizontal lines extending the width of the plots in A and B represent the mean frequency of sequence generation across all TCRβ amino acid sequences, regardless of the extent of sharing. The correlation and significance values are based on the Spearman test.
Analysis of in silico TCRβ repertoires with respect to the in vivo sharing of CMV-NV9-specific and EBV-GL9-specific TCRβ amino acid sequences. Simulations of a random V(D)J recombination process generated TCRβ repertoires using the TRBV12-4 and TRBJ1-2 genes, and the TRBV20-1 and TRBJ1-2 genes, each consisting of 20 million in-frame TCRβ sequences. These in silico TCRβ repertoires were used to investigate the relationships between the number of individuals in which CMV-NV9-specific (A, C, and E) and EBV-GL9-specific (B, D, and F) TCRβ amino acid (a.a.) sequences were observed in vivo (horizontal axes) and the in silico generation of the TCRβ amino acid sequences (vertical axes). The number of times the TCRβ amino acid sequences were generated in silico (A and B) reflects their relative potential production efficiency via a random V(D)J recombination process. The number of different nucleotide (n.t.) sequences encoding the amino acid sequences (C and D), and the number of different V(D)J recombination mechanisms that produced the amino acid sequences in the simulation (E and F) demonstrate the role of convergent recombination in the frequency of TCRβ production. Each point on the graph represents a TCRβ amino acid sequence observed in vivo. The most shared CMV-NL9-specific TCRβ amino acid sequence, found in eight individuals, was CASSSANYGYT, which used the TRBV12-4 and TRBJ1-2 genes (A, C, and E). The most shared EBV-GL9-specific TCRβ amino acid sequence, found in five individuals, was CSARDGTGNGYT, which used the TRBV20-1 and TRBJ1-2 genes (B, D, and F). The opacity of the circles indicates the density of points plotted, and the solid horizontal lines represent the medians of the quantities displayed on the vertical axes. The dashed horizontal lines extending the width of the plots in A and B represent the mean frequency of sequence generation across all TCRβ amino acid sequences, regardless of the extent of sharing. The correlation and significance values are based on the Spearman test.
Analysis of an in silico TCRβ repertoire with respect to in vivo sharing of CMV-NV9-specific TCRβ nucleotide sequences. Simulations of a random V(D)J recombination process generated a TCRβ repertoire using the TRBV12-4 and TRBJ1-2 genes and consisting of 20 million in-frame TCRβ sequences. This in silico TCRβ repertoire was used to investigate the relationships between the number of individuals in which CMV-NV9-specific TCRβ nucleotide (n.t.) sequences were observed in vivo (horizontal axes) and the in silico generation of the TCRβ nucleotide sequences (vertical axes). The number of times the TCRβ nucleotide sequences were generated in silico (A) reflects their relative potential production efficiency via a random V(D)J recombination process. The number of different V(D)J recombination mechanisms that produced the nucleotide sequences in the simulation (B) demonstrates the role of convergent recombination in the frequency of TCRβ production. Each point on the graph represents a TCRβ nucleotide sequence observed in vivo. The two most shared CMV-NL9-specific TCRβ nucleotide sequences, found in three individuals, encoded CASSSANYGYT and used the TRBV12-4 and TRBJ1-2 genes. The opacity of the circles indicates the density of points plotted, and the solid horizontal lines represent the medians of the quantities displayed on the vertical axes. The dashed horizontal lines extending the width of the plot in A represent the mean frequency of sequence generation, across all TCRβ nucleotides sequences, regardless of the extent of sharing. The correlation and significance values are based on the Spearman test.
Analysis of an in silico TCRβ repertoire with respect to in vivo sharing of CMV-NV9-specific TCRβ nucleotide sequences. Simulations of a random V(D)J recombination process generated a TCRβ repertoire using the TRBV12-4 and TRBJ1-2 genes and consisting of 20 million in-frame TCRβ sequences. This in silico TCRβ repertoire was used to investigate the relationships between the number of individuals in which CMV-NV9-specific TCRβ nucleotide (n.t.) sequences were observed in vivo (horizontal axes) and the in silico generation of the TCRβ nucleotide sequences (vertical axes). The number of times the TCRβ nucleotide sequences were generated in silico (A) reflects their relative potential production efficiency via a random V(D)J recombination process. The number of different V(D)J recombination mechanisms that produced the nucleotide sequences in the simulation (B) demonstrates the role of convergent recombination in the frequency of TCRβ production. Each point on the graph represents a TCRβ nucleotide sequence observed in vivo. The two most shared CMV-NL9-specific TCRβ nucleotide sequences, found in three individuals, encoded CASSSANYGYT and used the TRBV12-4 and TRBJ1-2 genes. The opacity of the circles indicates the density of points plotted, and the solid horizontal lines represent the medians of the quantities displayed on the vertical axes. The dashed horizontal lines extending the width of the plot in A represent the mean frequency of sequence generation, across all TCRβ nucleotides sequences, regardless of the extent of sharing. The correlation and significance values are based on the Spearman test.
A comparison of the in silico production frequency of the most shared EBV-GL9-specific TCRβ amino acid sequence, CSARDGTGNGYT, with the average frequency of production of all the observed EBV-GL9-specific TCRβ amino acid sequences using the TRBV20-1 and TRBJ1-2 genes showed that this sequence was produced relatively frequently in the random V(D)J recombination simulations (Fig. 4,B). We note that the two TCRβ nucleotide sequences that encode the CSARDGTGNGYT amino acid sequence and were produced most frequently in silico were observed in the in vivo TCRβ repertoire. We also investigated the relationship between in silico production frequency and the observed spectrum in the number of individuals sharing EBV-GL9-specific TRBV20-1/TRBJ1-2 TCRβ amino acid sequences. There was a trend for EBV-GL9-specific TRBV20-1/TRBJ1-2 TCRβ amino acid sequences found in more than one individual to be produced frequently in the simulations (Fig. 4,B). However, this correlation was not significant, partly due to a frequently produced unshared TCRβ amino acid sequence and smaller numbers of observed EBV-GL9-specific amino acid sequences using this specific gene combination. The latter is possibly a consequence of the smaller number of individuals considered for the EBV-GL9-specific response. The efficient production of the shared EBV-GL9-specific TCRβ amino acid sequences using the TRBV20-1 and TRBJ1-2 genes was due to these shared TCRβ sequences being produced by both a variety of nucleotide sequences (Fig. 4,D) and a variety of recombination events (Fig. 4 F).
Discussion
The sharing of TCRs between MHC-matched individuals has been observed in many CD8+ T cell responses specific for epitopes derived from CMV and EBV (9, 11, 16, 17, 18, 19, 20, 22, 23, 24, 25, 26, 27, 28, 29, 44). The sharing of TCRs in these epitope-specific responses has often been associated with the focused nature of the TCR repertoires. Thus, many studies have had an emphasis on understanding the selection of a biased TCR repertoire through structural features of the peptide-MHC complex and/or the TCR (26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36). However, the TCRs observed in a response to a given CMV or EBV epitope in many individuals must first be present in the naive TCR repertoires of these individuals. This is not easily explained, given that, in each individual, the ∼1012 peripheral T cells (2) are selected from a thymic TCR repertoire that potentially consists of >1015 different TCRs (1).
We have proposed previously that a process of convergent recombination plays an important role in the efficiency with which TCRs are produced in the thymus, and that TCR production frequency is an important determinant of the observed extent of sharing of epitope-specific TCRs (37, 38, 39). In this study, we investigated these relationships for the CD8+ T cell responses to the HLA A*0201-restricted CMV-NV9 and EBV-GL9 epitopes. Analysis of the epitope-specific TCRβ repertoires revealed a spectrum in the number of individuals sharing TCRβ amino acid sequences, as found in our previous studies in mice (37) and rhesus macaques (38). We determined that this spectrum of TCRβ sharing was significantly correlated with both the number of different nucleotide sequences found to encode the TCRβ amino acid sequences and the germline encoding of the TCRβ nucleotide sequences. These correlations suggest a relationship between TCRβ sharing and TCRβ production, because TCRβ amino acid sequences requiring fewer nucleotide additions and encoded by a greater variety of nucleotide sequences have the potential to be generated efficiently. They also suggest the involvement of convergent processes in the production of the TCRβ sequences by gene recombination. Computer simulations of a random gene recombination process were used to demonstrate that epitope-specific TCRβ amino acid sequences observed in more individuals in vivo are produced efficiently. A variety of V(D)J recombination mechanisms was involved in the in silico production of these shared epitope-specific TCRβ amino acid sequences and played a major role in some TCRβ sequences being produced more frequently than others. We also gained additional insight into the factors that contribute to TCRβ sharing by studying the sharing of the CMV-NV9-specific TCRβ nucleotide sequences because all of the different nucleotide sequences encoding a particular TCRβ amino acid sequence are subject to the same selective influences. The frequent production by unbiased V(D)J recombination in the simulations of the two most shared CMV-NV9-specific TCRβ nucleotide sequences is another strong indicator that TCR production frequency plays an important role in TCR sharing.
We have demonstrated that shared epitope-specific TCRβs are produced at relatively high frequencies during gene recombination, suggesting that a high level of TCRβ production is required for a TCRβ to be present in the epitope-specific responses of many individuals (37, 38, 39). However, the converse of this argument is not necessarily true. That is, just because a TCRβ can be produced easily does not mean that it will survive thymic selection, persist in the naive repertoire, and be involved in an immune response. Thymic and peripheral T cell selection in an outbred population can be influenced by MHC molecules, other than the matched one, that differ between individuals. There are also many factors (for example, the structural determinants of the TCR-peptide-MHC interaction (27, 28, 30, 32, 33, 34, 36), T cell competition for Ag (45), and stochastic events (46)), other than the precursor frequency of a TCR, that determine both the involvement and the dominance of a T cell clonotype in an immune response. Furthermore, it is the TCRαβ combination, and not just the TCRβ, that determines the thymic, peripheral, and Ag-driven selection of a T cell. For a TCRβ to be highly shared, it must also be paired with the correct TCR α-chain(s), which requires that the TCRβ is either promiscuous in its pairing with TCRα or that it pairs with a frequently produced TCRα (39). Thus, there are many factors that can prevent a frequently produced TCRβ from being shared in the Ag-specific responses of multiple individuals, and it is not surprising that we sometimes see unshared TCRβ sequences that also have the potential to be produced frequently (for example, the most frequently produced EBV-specific TCRβ sequence using the TRBV20-1/TRBJ1-2 genes was unshared (Fig. 4 B)).
The results of this study suggest that convergent recombination can facilitate the production of different TCRs at different frequencies, even if gene recombination is a completely random process. The proposed relationship between the extent of sharing of epitope-specific TCRβ between individuals and TCRβ production frequency suggests that there is a tendency for the hierarchy of frequencies of different TCRβ established during production in the thymus to be maintained through thymic and peripheral selection. This is consistent with the findings of the recent study (47) that investigated the presence of public CDR3β sequences in the preselection, naive peripheral, and Ag-specific repertoires, and concluded that public TCRβ were not selected preferentially into the naive repertoire. However, due to the size and large diversity of the thymic and naive TCR repertoires, sufficient data are not yet available to compare the thymic, naive, and epitope-specific T cell populations. Such a comparison would require an experimental approach that intensively sequences specific portions of the TCR repertoire that are restricted in Vβ and Jβ usage, and possibly CDR3β length. Although more work is required to understand the role of shared TCRs in immune responses, this study provides insight into the factors that contribute to the sharing of TCRs between individuals that is observed in many investigations of the CD8+ T cell responses to CMV and EBV. Advancing our understanding of the CD8+ T cell responses that play an important role in controlling CMV and EBV infections has important implications for the design of CD8+ T cell epitope-based vaccines (48, 49).
Disclosures
The authors have no financial conflict of interest.
Footnotes
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
This work was supported by the James S. McDonnell Foundation 21st Century Research Award/Studying Complex Systems, the Australian Research Council, the National Health and Medical Research Council, and the National Institutes of Health. M.P.D. is a Sylvia and Charles Viertel Senior Medical Research Fellow, and D.A.P. is a Medical Research Council (U.K.) Senior Clinical Fellow.
The online version of this article contains supplemental material.