Abstract
The diversity, architecture, and dynamics of the TCR repertoire largely determine our ability to effectively withstand infections and malignancies with minimal mistargeting of immune responses. In this study, we have employed deep TCRβ repertoire sequencing with normalization based on unique molecular identifiers to explore the long-term dynamics of T cell immunity. We demonstrate remarkable stability of repertoire, where approximately half of all T cells in peripheral blood are represented by clones that persist and generally preserve their frequencies for 3 y. We further characterize the extremes of lifelong TCR repertoire evolution, analyzing samples ranging from umbilical cord blood to centenarian peripheral blood. We show that the fetal TCR repertoire, albeit structurally maintained within regulated borders due to the lower numbers of randomly added nucleotides, is not limited with respect to observed functional diversity. We reveal decreased efficiency of nonsense-mediated mRNA decay in umbilical cord blood, which may reflect specific regulatory mechanisms in development. Furthermore, we demonstrate that human TCR repertoires are functionally more similar at birth but diverge during life, and we track the lifelong behavior of CMV- and EBV-specific T cell clonotypes. Finally, we reveal gender differences in dynamics of TCR diversity constriction, which come to naught in the oldest age. Based on our data, we propose a more general explanation for the previous observations on the relationships between longevity and immunity.
Introduction
The native human TCR repertoire changes throughout life based on Ag experience, exhaustion, and nonuniformity of T cell proliferation at the periphery, as well as on the aging-associated evolution and involution of the thymus (1–6). At any given moment, our accumulated population of expanded clonal T cells represents our up-to-date Ag experience and determines the current state of our adaptive immune defenses, whereas the remaining diversity of the naive T cell population represents the reservoir of new specificities that is available to respond to new challenges and to sustain a balanced network of regulatory interactions (1, 7–12).
The potential for revealing a detailed view of individual TCR repertoires, which opened with development of high-throughput sequencing techniques (13–16), has attracted many laboratories to this presently superficially explored field. Multiple studies have been performed in recent years to characterize TCR repertoires either in general (13–22) or at the level of functional T cell subsets (17, 18, 23–33). However, these new opportunities have also brought new hidden pitfalls, which have not always been predicted, recognized, and proportionally addressed in the works published to date. As long as the analyzed material is of sufficient quality and quantity, shallow profiling at the scale of up to several thousands of expanded clonotypes can usually be reliably interpreted. At the same time, deep and reliable profiling of hundreds of thousands and millions of analyzed T cells and sequencing reads remains challenging. This is compounded by biased amplification and accumulation of PCR and sequencing errors (34–39), which must be distinguished from the true highly homologous TCR variants originating from convergent recombination events (18, 19, 40). Potential cross-sample contamination and sequencing artifacts can further hamper accurate comparison of TCR repertoires in different samples. As we have accumulated more experience with deep TCR repertoire profiling, these limitations have become clearer, along with the possible solutions for obtaining more quantitative data on clonotype diversity, minimizing the generation of artificial and loss of real receptor diversity (35, 37–39, 41, 42).
In this study, we employed our present molecular and software solutions to track both deeply and quantitatively the individual dynamics and stability of T cell clones during 3 y. We also reveal characteristic features of TCR repertoire at the polar extremes of the human life, including umbilical cord blood (UCB) and peripheral blood samples obtained from centenarians, and we describe the general dynamics and extent of changes in the structure of the TCR repertoire throughout life. Furthermore, we reveal gender differences in adaptive immunity aging and link this finding to previous data on population studies.
Materials and Methods
Sample collection
This study was approved by the local ethics committee and conducted in accordance with the Declaration of Helsinki. All donors were informed of the final use of their blood and signed an informed consent document. The cohort included 65 healthy individuals aged 6–103 y (see Table II). We excluded individuals who had previously been diagnosed with cancer or autoimmune disease. We obtained 6–10 ml peripheral blood from each donor. Each sample was collected into a number of EDTA-treated Vacutainer tubes (BD Biosciences, Franklin Lakes, NJ). PBMCs were extracted using Ficoll-Paque (Paneco, Kirov, Russia) density gradient centrifugation. Before RNA extraction, all PBMC samples were initially frozen, then thawed and incubated overnight in RPMI 1640 with 10% human serum (First Link, Birmingham, U.K.) in the presence of 15 U/ml IL-2 (Roche, Basel, Switzerland). According to our experience, such treatment increases the RNA expression levels of all TCR genes. For total RNA extraction, TRIzol reagent (Life Technologies, Carlsbad, CA) was used.
Donor Groups . | Average Age (Range) (y) . | Number of Patients (Female/Male/NK) . | Observed TCRβ CDR3 Diversity per 3 × 105 T Cells (×105) . |
---|---|---|---|
UCB | 0 | 8 (2/3/3) | 2.7 ± 0.1 |
Young | 16 (6–25) | 11 (4/7/0) | 2.2 ± 0.3 |
Middle age | 40 (30–50) | 13 (7/6/0) | 1.6 ± 0.5 |
Aged | 60 (51–75) | 18 (11/7/0) | 1.4 ± 0.6 |
Long-lived | 92 (85–103) | 23 (15/8/0) | 0.9 ± 0.4 |
Donor Groups . | Average Age (Range) (y) . | Number of Patients (Female/Male/NK) . | Observed TCRβ CDR3 Diversity per 3 × 105 T Cells (×105) . |
---|---|---|---|
UCB | 0 | 8 (2/3/3) | 2.7 ± 0.1 |
Young | 16 (6–25) | 11 (4/7/0) | 2.2 ± 0.3 |
Middle age | 40 (30–50) | 13 (7/6/0) | 1.6 ± 0.5 |
Aged | 60 (51–75) | 18 (11/7/0) | 1.4 ± 0.6 |
Long-lived | 92 (85–103) | 23 (15/8/0) | 0.9 ± 0.4 |
NK, not known.
Flow cytometry
Flow cytometry analysis was performed on a Cytomics FC 500 (Beckman Coulter). Data analysis was carried out using the Cytomics RXP analysis program (Beckman Coulter). For staining, aliquots of PBMCs were incubated for 20 min at room temperature with anti-human Abs CD3-PC7 (clone UCHT1, eBioscience), CD27-PC5 (clone 1A4CD27, Invitrogen), CD4-PE (clone 13B8.2, Beckman Coulter), and CD45RA-FITC (clone JS-B3, eBioscience) and then washed twice with PBS.
TCRβ libraries preparation and data analysis
cDNA synthesis and template switch to the SmartNNNa 5′-adaptor with a unique molecular identifier (5′-AAGCAGUGGTAUCAACGCAGAGUNNNNUNNNNUNNNNUCTT(rG)(rG)(rG)(rG)-3′), two-stage seminested PCR amplification, multiplexing, and sequencing were performed as described (5). Raw TCR sequencing data were processed using the MiGEC pipeline (39). Briefly, checkout utility was used for data sample barcode matching and unique molecular identifier sequence extraction. The data were then assembled using the assemble utility with the erroneous identifiers filtering option and at least two reads per molecular identifier group (MIG) oversequencing threshold. MIG oversequencing and erroneous identifiers histograms were calculated using the histogram utility. As in the present experimental setup, the deep repertoire sequencing was chosen in favor of high sequence coverage of individual cDNA molecules, and additional error correction was performed at the final stage of data processing. This was carried out using MiTCR software (37) with the “eliminate these errors” option for frequency-based error correction. All statistical analyses, clustering procedures, model fitting, and plotting were performed in R. For the cluster analysis and multidimensional scaling, the similarity measure was used, where fxi and fxy are the frequencies of an overlapped clonotype i in sample x and y, respectively, and N is the total number of overlapped clonotypes. This measure accounts for both overall overlap size and correlation of overlapping clonotype frequencies (42).
Normalization of repertoires overlap
The number of intersecting clonotypes (see Fig. 2E) for a fixed pair of samples d12 is proportional to the product of sample diversities d1 and d2, d12 = A × d1 × d2 (20). Still, on the population level the coefficient A may vary depending on sample pair, that is, A = A(d1,d2). With this in mind, we have fitted an empirical power model to our data and determined that A = A0 × (d1 × d2)−0.2 with r2 = 0.96 (n = 2628 pairs). Therefore, we have selected D = d12/(d1 × d2)0.8 as the measure for normalized overlap of unique TCRβ amino acid variants.
Overall changes in T cell immunity. Data are shown for UCB samples and peripheral blood samples collected from donors of different age groups (denoted by different colors). (A) Percentage of nonfunctional TCR β mRNA molecules (η2 = 0.27, p = 10−4). (B) Average number of added N nucleotides within CDR3 per clonotype (η2 = 0.44, p = 2 × 10−8). (C) TCRβ diversity observed per 300,000 T cells (η2 = 0.65, p = 2 × 10−15). (D) Percentage of naive T cells among all T cells as measured by FACS analysis (η2 = 0.72, p = 9 × 10−17). (E) Normalized number of TCRβ clonotypes shared between pairs of samples within each age group (η2 = 0.18, p < 2 × 10−16). (F) Percentage of naive CD4+ (light) and naive CD8 (dark) T cells among all T cells as measured by FACS analysis (η2 = 0.52 and 0.79, p = 2 × 10−9 and 4 × 10−20). All parameters have a significant association with age as shown by ANOVA test. Effect size η2 and p values are given in brackets.
Overall changes in T cell immunity. Data are shown for UCB samples and peripheral blood samples collected from donors of different age groups (denoted by different colors). (A) Percentage of nonfunctional TCR β mRNA molecules (η2 = 0.27, p = 10−4). (B) Average number of added N nucleotides within CDR3 per clonotype (η2 = 0.44, p = 2 × 10−8). (C) TCRβ diversity observed per 300,000 T cells (η2 = 0.65, p = 2 × 10−15). (D) Percentage of naive T cells among all T cells as measured by FACS analysis (η2 = 0.72, p = 9 × 10−17). (E) Normalized number of TCRβ clonotypes shared between pairs of samples within each age group (η2 = 0.18, p < 2 × 10−16). (F) Percentage of naive CD4+ (light) and naive CD8 (dark) T cells among all T cells as measured by FACS analysis (η2 = 0.52 and 0.79, p = 2 × 10−9 and 4 × 10−20). All parameters have a significant association with age as shown by ANOVA test. Effect size η2 and p values are given in brackets.
Results
Optimized approach for deep cDNA-based TCR repertoire profiling
RNA-based TCRβ libraries, where each cDNA molecule was labeled with a unique molecular identifier (43), were prepared as described previously (5) and analyzed via Illumina HiSeq 2500 paired-end 100 + 100 nt sequencing. In our downstream bioinformatics analysis, we grouped sequencing reads labeled with the same unique cDNA identifier to the MIG. Each nucleotide within the TCRβ CDR3 sequence in each MIG was identified according to the cumulative quality in the assembled sequencing reads. This approach improves sequencing quality and eliminates most PCR and sequencing errors according to the Safe-seqS logic described by Kinde et al. (44). Because MIG-based analysis counts initial cDNA events, it also eliminates amplification and sequencing biases and artifacts and thus significantly improves quantitative power in repertoire data analyses (5, 39, 45–48).
Importantly, we have accounted for artificial molecular identifier variants that result from errors within the identifier sequence itself (Supplemental Fig. 1A). To remove those, we filtered out the MIGs in which the molecular identifier had a single-mismatch parent with a ≥10-fold relative quantity of sequencing reads. Such correction is obligatory for accurate quantification of true, distinct cDNA events. We also counted only those uniquely labeled cDNA molecules that were sequenced at least twice, filtering out cDNA molecules covered by just a single sequencing read. This threshold additionally protects from artifacts and cross-sample contaminations, because at least two independent sequencing read events on the solid phase of Illumina are analyzed for each cDNA (Supplemental Fig. 1B). Notably, this filter is blind with respect to clonotype size, and it does not result in selective loss of naive T cells diversity, which would be the case if we filtered out the true singletons, that is, clones represented by a single cDNA molecule.
Because we sought to collect the maximum number of cDNA events for each sample in the deep profiling, we did not apply the full MiGEC error-correction logic (39), which would require oversequencing with a threshold of at least five reads per cDNA. Instead, we employed the “eliminate these errors” mode in MiiTCR software with MIG data for the assembly of clonotypes and additional frequency-based error correction (37).
To understand the quantitative power of the method, we first analyzed a frozen sample of a single blood draw from individual 1, which we divided into two unequal portions of PBMCs, and two separately frozen unequal PBMC samples obtained from a single blood draw of individual 2. Starting from 3 to 10 million PBMCs, TCRβ sequencing with MIG-based analysis yielded 1–2.6 million distinct, reliably sequenced TCRβ cDNA molecules per sample, which included 0.4–0.6 million distinct TCRβ clonotypes per sample (Table I). Analysis of data on the two split portions of a single thawed PBMC sample of individual 1 demonstrated almost absolute correlation in clonotype frequencies (r2 = 1.00, Fig. 1A). We therefore concluded that library preparation, sequencing, and data analysis pipeline introduce minimal quantitative differences between samples. However, we also observed that sampling, freezing, and thawing procedures may introduce some variance, because clonotype frequencies in the two separately processed and frozen PBMC samples of individual 2 demonstrated lower correlation (r2 = 0.92, Fig. 1A). In other analogous experiments, correlation between the two independent replicate samples ranged from ∼0.8 to ∼0.9 for three sample pairs (data not shown). Correspondingly, the relative homeostatic space occupied by shared clonotypes and the size of the top clonotypes were nearly identical between replicas, confirming the high accuracy of the method (Fig. 1B).
Donor . | Year of Sampling . | Age (y) . | PBMCs . | Raw Paired-End Sequencing Reads with CDR3 . | Analyzed cDNA Molecules with CDR3a . | Nucleotide TCRβ CDR3 Clonotypes . | Common Nucleotide Clonotypes . | Overlap (%)b . | Correlation of Shared Clonotype Frequencies (r2) . |
---|---|---|---|---|---|---|---|---|---|
Individual 1 | 2013 | 87 | 5 × 106 | 23,918,642 | 2,343,918 | 402,378 | 102,903 | 72 | 1.00 |
3 × 106 | 25,684,335 | 2,649,250 | 401,771 | ||||||
Individual 2 | 2013 | 25 | 10 × 106 | 6,207,486 | 1,260,339 | 570,534 | 37,917 | 43 | 0.92 |
5 × 106 | 6,101,552 | 1,032,585 | 467,651 | ||||||
Individual 3 | 2011 | 27 | 20 × 106 | 47,765,136 | 7,038,180 | 1,828,488 | 78,009 | 48 | 0.66 |
2014 | 30 | 9 × 106 | 10,464,702 | 1,531,400 | 760,839 | ||||
Individual 4 | 2011 | 47 | 20 × 106 | 44,028,675 | 6,040,218 | 965,644 | 36,854 | 64 | 0.82 |
2014 | 50 | 7.9 × 106 | 2,713,450 | 518,116 | 137,682 |
Donor . | Year of Sampling . | Age (y) . | PBMCs . | Raw Paired-End Sequencing Reads with CDR3 . | Analyzed cDNA Molecules with CDR3a . | Nucleotide TCRβ CDR3 Clonotypes . | Common Nucleotide Clonotypes . | Overlap (%)b . | Correlation of Shared Clonotype Frequencies (r2) . |
---|---|---|---|---|---|---|---|---|---|
Individual 1 | 2013 | 87 | 5 × 106 | 23,918,642 | 2,343,918 | 402,378 | 102,903 | 72 | 1.00 |
3 × 106 | 25,684,335 | 2,649,250 | 401,771 | ||||||
Individual 2 | 2013 | 25 | 10 × 106 | 6,207,486 | 1,260,339 | 570,534 | 37,917 | 43 | 0.92 |
5 × 106 | 6,101,552 | 1,032,585 | 467,651 | ||||||
Individual 3 | 2011 | 27 | 20 × 106 | 47,765,136 | 7,038,180 | 1,828,488 | 78,009 | 48 | 0.66 |
2014 | 30 | 9 × 106 | 10,464,702 | 1,531,400 | 760,839 | ||||
Individual 4 | 2011 | 47 | 20 × 106 | 44,028,675 | 6,040,218 | 965,644 | 36,854 | 64 | 0.82 |
2014 | 50 | 7.9 × 106 | 2,713,450 | 518,116 | 137,682 |
cDNA labeled by distinct molecular barcodes, each sequenced at least twice.
Relative overlap was calculated as average percentage occupied by shared clonotypes (see 2Materials and Methods).
Stability of individual TCR repertoire over time. (A) Scatter plots of clonotype frequencies in two replicas obtained from individuals 1 and 2, and in two sampling time points for individuals 3 and 4. Clonotypes shown are present in both sampling time points and replicas. Point size is scaled according to clonotype frequency. (B) Stack plots of clonotype concentrations for the four individuals. Two replicas are shown for individuals 1 and 2, and two time points are shown for individuals 3 and 4. The concentration changes of shared clonotypes are shown with colored areas.
Stability of individual TCR repertoire over time. (A) Scatter plots of clonotype frequencies in two replicas obtained from individuals 1 and 2, and in two sampling time points for individuals 3 and 4. Clonotypes shown are present in both sampling time points and replicas. Point size is scaled according to clonotype frequency. (B) Stack plots of clonotype concentrations for the four individuals. Two replicas are shown for individuals 1 and 2, and two time points are shown for individuals 3 and 4. The concentration changes of shared clonotypes are shown with colored areas.
Stability of the TCR repertoire
Earlier studies have demonstrated that human virus-specific memory T cell clones may be stable for many years (49–54). However, as far as we know, the full extent of changes that occur in a healthy individual’s TCR repertoire over the years has not yet been tracked with deep sequencing. To directly visualize how stable the human T cell repertoire is over time, we performed deep TCRβ profiling for peripheral blood samples obtained from two systemically healthy adult individuals at two different time points 3 y apart (Table I).
Comparison of the datasets obtained from the two time points revealed 78,009 and 36,854 clones present in both samples for individuals 3 and 4, respectively. The overlap calculated as the sum of geometric means of frequencies of shared clonotypes between the two time points was comparable to that of the two replicate samples obtained for individual 2.
Most persistent clones generally preserved their relative frequencies during this 3-y span (R = 0.861 and 0.905 for individuals 3 and 4, respectively; Fig. 1A). On average, the top 1000 clones exhibited a 2- and 1.8-fold change in concentration for individuals 3 and 4, respectively. Of the top 1000 clones observed at the first time point, 952 and 882 were identified at the second time point for individuals 3 and 4, respectively. Similarly, of the top 1000 clones observed at the second time point, 962 and 966 clones were identified at the first time point for individuals 3 and 4, respectively. Thus, we observed minimal differences in the expanded T cell repertoires across this time period, comparable with the differences between biological replicas obtained at the same time point.
Moreover, persistent clones cumulatively constituted from 45 to 75% of the total T cell counts in peripheral blood (Fig. 1B). The percentage occupied by persistent clones was in a good agreement with the percentage occupied by Ag-experienced T cells measured by flow cytometry (41 and 79% for individuals 3 and 4, respectively), indicating that most clonal expansions persist for years. To reveal the exact nature of persistent clones, we additionally performed deep TCRβ repertoire sequencing for the FACS-sorted T cells of individual 3. This analysis revealed that tracked clones are represented by both CD8+ and CD4+ Ag-experienced T cells in comparable proportions (∼5:3). Notably, tracked CD8+ clones were mostly represented by large clonal expansions, whereas CD4+ clones were characterized by lower expansion and ∼5-fold higher diversity. Therefore, some portion of stable but low-frequency CD4 clones could be missed in our analysis due to sampling limitations.
The space occupied by persistent clones enlarged with age in the elder individual, but shrank in the younger. Hypothetically, this could reflect faster turnover of T cell clones at younger ages and continuing expansion of long-lived memory T cell clones in the elderly, although this has to be confirmed in further studies using deep and long-term tracking of TCR repertoire dynamics on a cohort of donors of different ages.
Dynamics of TCR repertoires
In order to characterize the lifelong changes that affect TCR repertoires, we analyzed TCRβ repertoires for 8 UCB and 65 peripheral blood samples of healthy individuals of different ages (Table II, raw sequencing data deposited in the Sequence Read Archive database, accession number PRJNA316572). This cohort included the 39 individuals previously published in Britanova et al. (5), which we reanalyzed in the present study for better resolution as described above. To equalize and correctly compare these 73 samples, we analyzed 300,000 randomly chosen TCRβ cDNA molecules per sample, each labeled with a unique molecular identifier. Considering the excess (millions) of T cells used for RNA purification, that is approximately equivalent to 300,000 randomly chosen T cells. Additionally, we performed flow cytometry analysis for the population of naive T cells in the CD8+ and CD4+ populations (CD27high/CD45RAhigh, in 68 of 73 samples, Supplemental Fig. 2) and counted absolute numbers of CD3+ cells (in 36 of 73 samples).
This comparative analysis revealed a number of hallmarks that characterize T cell immunity at different life periods. 1) When compared with any other age group, UCB samples were characterized by a significantly higher percentage of nonfunctional TCRβ mRNA molecules representing either out-of-frame TCR variants or those containing a stop codon within their CDR3 (p < 0.05 versus 6–25 y age group and p < 0.001 versus other age groups, two-tailed t test, Holm correction; Fig. 2A). Because the level of nonfunctional TCRβ mRNA is determined by nonsense-mediated decay (NMD) mechanisms (55–57), this should be a consequence of NMD suppression during the prenatal period. Interestingly, it has been demonstrated that low oxygen slows down NMD (58), so one of the possible explanations is the lower oxygen in UCB compared with peripheral blood (59). At the same time, this could also be the consequence of more specific NMD regulatory mechanisms that may play important roles in development (58, 60–62). 2) UCB samples were characterized by a significantly lower numbers of added N nucleotides within CDR3 (average per clonotype) than other age groups (p < 5 × 10−13, two-tailed t test, Holm correction; Fig. 2B). Similar observations were previously reported in both mice and humans, based on a few sequenced clonotypes and TdT RNA expression analysis (63–66). In this study, we confirm this feature of human TCR repertoires at the deep sequencing scale. Notably, this feature did not influence TCRβ diversity observed per 300,000 T cells, which was the highest in UCB compared with other samples. Almost each captured cDNA encoded a distinct TCRβ variant, resulting in >250,000 distinct clonotypes per sample (Fig. 2C). The highest individual TCRβ diversity observed in UCB correlated with the highest percentage of naive T cells, and both parameters declined at later ages with similar kinetics (Fig. 2D). Therefore, importantly, low TdT activity during the prenatal period lowers the entropy but does not limit the observed TCR diversity. 3) At the same time, low TdT activity in the fetal period influenced relative cross-individual similarity of UCB repertoires. We studied the average number of TCRβ clonotypes shared between two people within each age group, normalized based on sample diversity (see 2Materials and Methods). These metrics do not account for clonal size and reflect the general similarity of naive T cell repertoire diversities. Correspondingly, this overlap was high between UCB samples, where low TdT activity limits theoretical diversity of produced TCR β variants (67), but further decreased to lower levels and remained stable with aging (Fig. 2E). 4) Based on flow cytometry data, we observed different dynamics in the percentages of naive CD4 and naive CD8 T cells within the total T cell population. Naive CD4+ T cells represented more than half of all T cells in UCB, whereas naive CD8+ T cells represented ∼20%. The ratio of CD4 and CD8 naive subsets tend to equalize in the young age (Fig. 2F), along with the increase in proportion of Ag-experienced CD8+ T cells (Supplemental Fig. 3A, 3B), although additional analysis for a larger cohort is required to confirm statistical significance of these changes. The percentage of naive T cells decreased with age (Fig. 2D), with a more rapid decline observed within the CD8+ subset (Fig. 2F). As a result, naive CD4+ T cells dominated over the naive CD8+ T cells in UCB and in old age, but not in the young age (Fig. 2F, Supplemental Fig. 3C). Considering the decrease of total T cell counts in blood with aging (68) (Supplemental Fig. 3E), the decrease of the absolute counts of naive T cells per microliter of peripheral blood was even more dramatic (Supplemental Fig. 3F). We further analyzed the ratio of the proportion of naive T cells within the CD4+ subset to the proportion of naive T cells within the CD8+ subset. This parameter is not dependent on the CD4+/CD8+ cell ratio and thus reflects to which extent the CD4+ subset is naive compared with the CD8+ subset, irrespective of their relative abundance. This analysis confirmed our previous observation (5) that the T cell population of the oldest individuals is specifically characterized by a relatively more naive state of CD4+ relative to CD8+ T cell subset (Supplemental Fig. 3D). 5) We further examined how the functional similarity of TCR repertoires changed with age. To this end, we analyzed how the share of repertoire represented by amino acid CDR3 clonotypes that are common for the two individuals correlates with their age (accounting for relative clonotype size, as opposed to the normalized number of shared sequence variants discussed above and shown in Fig. 2E). This analysis revealed that functionally TCR repertoires are more similar in childhood, and they subsequently diverge from this common core repertoire with aging (r2 = 0.54, n = 73, p = 4 × 10−15; Fig. 3A–C). This high share of common T cells in UCB samples and in childhood probably results from multiple public clonotypes produced due to recombinatorial biases (19, 69), which are abundantly represented within naive but not Ag-experienced T cells. Furthermore, input of prenatal TdT-free TCR repertoire in UCB and potentially in childhood should add to abundance of high-frequency public clonotypes (Fig. 2B). With aging, the proportion of naive T cells goes down. Additionally, high-frequency public clonotypes, initially abundant, gradually decrease their frequency within the naive T cell subset (M. Pogorelyy and D. Chudakov, unpublished observations for several individuals), because peripheral homeostatic proliferation that supports naive T cell numbers after decline of the thymic function (70) is not uniform (71–73). Both processes should result in a decrease of the share occupied by common T cells with aging. These considerations are supported by a decrease in the degree of convergence, measured as the average number of nucleotide CDR3 sequences per each amino acid CDR3 variant (Fig. 3D). 6) Next we analyzed repertoires to identify amino acid TCRβ clonotypes annotated as specific for common pathogens, including CMV and EBV, collected via literature mining (Supplemental Table I). The number of such clonotypes decreased with age (r2 = 0.31, n = 73, p = 1 × 10−7; Fig. 3E). At the same time, the average size of known viral-specific clonotypes increased with age, with a 7-fold growth observed for the 51–75 y age group compared with UCB samples (p = 0.02, Holm correction; Fig. 3F). This observation suggests that naive T cells from young individuals comprise a substantial number of virus-specific precursors due to the evolutionary programmed probabilities of recombination events, which are further supplanted by expansion of stochastically selected primed clones. Interestingly, the average size of viral-specific clonotypes decreased in very old age (Fig. 3F). This could result either from the exhaustion of Ag-specific T cell clones, or from age-selection barrier favoring longevity of individuals with either lower CMV and EBV burden (74, 75), or from lower clonal dominance in CMV- and EBV-specific responses. 7) We performed comparison of TCRβ repertoire diversity between donors of different sex within each age group. This analysis revealed that TCRβ diversity decreases more rapidly in middle age for males than for females (Fig. 4A, p = 0.02, two-tailed t test with adjustment for multiple comparison). This corresponds with a faster decline in the percentage of naive T cells in middle-aged males within both the CD4+ and CD8+ subsets (Fig. 4B, Supplemental Fig. 4). At the same time, no gender differences were observed at very old age. This is aligned with the fact that the survival advantage of females versus males vanishes with aging (76) (see below).
Repertoire divergence with age. (A–C) Cluster analysis of repertoire pairs using a distance measure proportional to the sum of frequencies of overlapping clonotypes. (A) Dendrogram constructed using hierarchical clustering. (B) Multidimensional scaling of samples. The plot in (C) shows the distance metric computed as the Euclidean distance of a sample in multidimensional scaling coordinates to the centroid coordinates obtained by averaging the x- and y-coordinates of all samples. (D) Convergence of V(D)J recombination, as calculated by the average number of nucleotide CDR3 sequences per amino acid CDR3 sequence. (E) The total number of CMV- and EBV-specific clonotypes, annotated based on sequences derived from the literature. (F) Average size of a pathogen-specific clonotype. Sample labels are colored according to age, color bar.
Repertoire divergence with age. (A–C) Cluster analysis of repertoire pairs using a distance measure proportional to the sum of frequencies of overlapping clonotypes. (A) Dendrogram constructed using hierarchical clustering. (B) Multidimensional scaling of samples. The plot in (C) shows the distance metric computed as the Euclidean distance of a sample in multidimensional scaling coordinates to the centroid coordinates obtained by averaging the x- and y-coordinates of all samples. (D) Convergence of V(D)J recombination, as calculated by the average number of nucleotide CDR3 sequences per amino acid CDR3 sequence. (E) The total number of CMV- and EBV-specific clonotypes, annotated based on sequences derived from the literature. (F) Average size of a pathogen-specific clonotype. Sample labels are colored according to age, color bar.
Age-dependent changes in T cell repertoire in males and females. (A) Observed TCRβ diversity per 300,000 T cells. Note gender difference in the 30–50 y age group. (B) Percentages of naive CD4+ and CD8+ T cells among all T cells for the 30–50 y age group. F, females; M, males.
Age-dependent changes in T cell repertoire in males and females. (A) Observed TCRβ diversity per 300,000 T cells. Note gender difference in the 30–50 y age group. (B) Percentages of naive CD4+ and CD8+ T cells among all T cells for the 30–50 y age group. F, females; M, males.
Discussion
Capable of undergoing complex dynamic changes and intelligent adaptation to various challenges, our T cell immunity can also be remarkably stable over time. In this study, we demonstrate that, in middle age, persistent clones may occupy more than half of the homeostatic space in the peripheral blood, barely changing their frequency over several years (Fig. 1). This might be important for not only protecting against those infections that the immune system is already familiar with (49–53, 77), but also for sustaining balanced regulation of the existing social network of immune cells (78, 79).
The establishment of this social network starts at a very early age and is manifested by highly dynamic changes in T cell immunity and TCR repertoire. Indeed, comparative analysis of UCB relative to peripheral blood samples from young children reveals that the native T cell repertoire that we are born with is characterized by a number of specific features that disappear in the earliest years of life (Fig. 2).
Cord blood samples are characterized by lower numbers of added nucleotides within CDR3, reflecting low TdT activity in the fetal period (Fig. 2B). This feature limits the overall theoretical diversity of TCR variants (80, 81), which was reflected in the relatively high numbers of clonotypes shared among UCB samples (Fig. 2E). At the same time, this limitation did not fundamentally alter the observed TCRβ diversity (Fig. 2C). This indicates, importantly, that the fetal TCRβ repertoire, albeit structurally maintained within regulated borders, is not actually limited in the number of diverse sequence variants that may encounter Ag in terms of defined volume.
CD4+ T cells dominate over CD8+ T cells, and naive CD4+ T cells dominate over naive CD8+ T cells in the cord blood, but both ratios tend to equalize in early childhood (Fig. 2F, Supplemental Fig. 3A–C), which may partially result from rapid expansion of the newly primed CD8+ T cell clones. Correspondingly, the percentage of naive CD4+ T cells of all T cells decreases from ∼50% in UCB to ∼30% in childhood (Supplemental Fig. 3B).
Later in life, involution of thymus, limited proliferative capacity of naive T cells, and expansion of Ag-experienced clones (1–5) lead to a consistent decrease in percentages of T cell homeostasis occupied by naive T cells (Fig. 2D), and consequently in a decrease of the observed TCR diversity (Fig. 2C). Furthermore, a decrease of intrinsic diversity (Ref. 6 and M. Pogorelyy and D. Chudakov, unpublished observations for several individuals) and smearing of the initially high-frequency clonotypes within the naive T cells pool (M. Pogorelyy and D. Chudakov, unpublished observations) occur with aging as a result of nonuniform homeostatic proliferation on the periphery (70–73). We demonstrate in the present study that the dynamics of the TCR diversity decrease differ for males and females. By the age of 40, the the difference in TCR diversity becomes prominent (Fig. 4A). The difference is mainly determined by the more rapid shrinkage of naive T cells pool (Fig. 4B) and might be a consequence of gender variability of thymic output (82) and other gender-specific factors. Paradoxically, however, men and women in the oldest age cohort are characterized by similar TCRβ diversity and percentages of naive T cells (Fig. 4A, Supplemental Fig. 4). Additionally, individuals in this age group are characterized by higher percentages of naive T cells within the CD4+ subset compared with that within the CD8+ subset (Supplemental Figs. 3D, 4).
Both of these phenomena could be explained by the age-selection barrier, which introduces bias to the oldest cohort. In this view, balanced function of the adaptive immune system in older age depends on the availability of sufficient counts of naive CD4+ T cells, as well as their diversity and domination over the counts of naive CD8+ T cells. Broadly speaking, potential regulators should dominate over potential effectors. Given that those individuals whose immune system fulfills these criteria have higher chances of living beyond the age of ∼70 y, this would influence the observed characteristics of the oldest age group and contribute to a leveling of the previously observed gender differences. This model suggests a common explanation both for the observation by Ferguson and colleagues (83) that an inverted CD4+/CD8+ ratio in aged people is associated with increased 2-y mortality and for that of Pawelec and colleagues (84) that there is, paradoxically, an inverse correlation between the overall percentage of naive CD8+ T cells and survival. It is also in agreement with the point that the survival advantage of females versus males declines at very old age (76), when we observe equalization of TCRβ diversity and percentages of naive T cells in males and females. These results may indicate that relative abundance of naive CD4+ T cells in middle age essentially contributes to the generally increased longevity observed in women (85, 86).
Unfortunately, we have no information on CMV infection status of the individuals in our study, which is known to influence immunosenescence (87). We could expect that the cohort of oldest individuals who overcame the age-selection barrier is enriched with CMV-negative individuals, for which the percentage of naive CD4+ T cells was shown to be more stable with aging (88). This could ultimately lead to conclusion that, although gender differences determine the difference in life expectancy in middle age, in older age the factor of CMV status prevails.
Finally, we have demonstrated that the repertoires of different individuals are functionally more similar at birth and early childhood and subsequently diverge during aging (Fig. 3A–D). Paraphrasing Leo Tolstoy, all young people’s TCR repertoires are alike; each old individual TCR repertoire is different (89). At the same time, the observation that individuals are born with repertoires containing a high number of public pathogen-specific clonotypes (Fig. 3E) could be indicative of human–pathogen coevolution. One could speculate that we start our lives with roughly similar and evolutionary predesigned immune repertoires, which then diverge into a multitude of states as we age. The specific nature of these divergent states is determined by environmental experiences throughout life and the stochastic nature of particular immune responses. The latter means that if two hypothetical individuals with an identical TCR repertoire (never existed, because even TCR repertoires of monozygotic twins are different) (22) encounter exactly the same Ag, their repertoire will still diverge, as was recently suggested by Walczak and colleagues (90) on a theoretical basis.
Our study has several limitations. In particular, for most individuals we have not sorted naive and memory, CD4+ and CD8+ populations for TCR repertoire profiling, as should be preferably performed (6), but ideally performed with molecular barcoding (5) and with oversequencing that allows to eliminate PCR and sequencing errors without losing the true diversity of convergent TCR variants (39). We also have not analyzed TCRα repertoire diversity. Per se, TCRα repertoire profiling would not add much because TCRβ repertoire is more diverse. However, combinatorial diversity of TCR α- and β-chains (91) is important. Paired analysis of TCR α- and β-chain repertoires that should ultimately become possible (92, 93) will strengthen the studies of adaptive immunity dynamics, allowing to differentiate between clonal (α and β) versus clonotype (α or β) fates. Future works employing more targeted strategies to track the long-term fate of T cell subsets and Ag-specific clones in a high-throughput manner should reveal the whole picture of life-long adaptive immunity dynamics in detail.
Acknowledgements
The work was carried out in part using equipment provided by the Shemyakin–Ovchinnikov Institute of Bioorganic Chemistry Core Facility.
Footnotes
This work was supported by Russian Science Foundation Project 16-15-00149. M.S. and E.V.P. are supported by individual fellowship Russian Foundation for Basic Research Grants 16-34-60179 and 16-34-60178.
The raw sequencing data in this article have been submitted to the National Center for Biotechnology Information Sequence Read Archive database (https://www.ncbi.nlm.nih.gov/Traces/sra/) under accession number PRJNA316572.
The online version of this article contains supplemental material.
References
Disclosures
The authors have no financial conflicts of interest.