## Abstract

We measured CD8 T cell clonotypic diversity to three epitopes recognized in C57BL/6 mice infected with mouse hepatitis virus, strain JHM, or lymphocytic choriomeningitis virus. We isolated epitope-specific T cells with an IFN-γ capture assay or MHC class I/peptide tetramers and identified different clonotypes by Vβ chain sequence analysis. In agreement with our previous results, the number of different clonotypes responding to all three epitopes fit a log-series distribution. From these distributions, we estimated that >1000 different clonotypes responded to each immunodominant CD8 T cell epitope; the response to a subdominant CD8 T cell epitope was modestly less diverse. These results suggest that T cell response diversity is greater by 1–2 orders of magnitude than predicted previously.

The adaptive immune system must be able to recognize a large repertoire of foreign Ags, while including enough naive precursor cells to any Ag to generate an infection-controlling response. The diversity of the αβ TCR response results from imprecise recombination of the V, D, and J regions of α- and β-chains coupled with pairing of one β-chain and one to two α-chains per T cell (1). The theoretical diversity of the T cell repertoire is ∼10^{15} (2), but the total number of CD8 T cells in a single mouse is estimated to be 2 × 10^{7}, with a diversity of 2–5 × 10^{5} different clonotypes per mouse spleen (3, 4).

The number of precursor cells specific to a single epitope appears to be too low to estimate via direct isolation with current techniques. Several alternative approaches to this problem have yielded very different values for T cell clonotype diversity. For instance, Maryanski et al. and Bousso et al. (5, 6) exploited the limited CD8 T cell response to HLA-A2 and HLA-CW3 in DBA/2 mice (H-2K^{d} restricted). CD8 T cells responding to single peptides within these Ags primarily use the Vβ10 element with a complementarity-determining region 3 (CDR3)^{3} length of 6 or 10 aa. By isolating and sequencing TCR cDNA clones corresponding to this subgroup, the authors estimated that ∼200 cells or 15–20 different clonotypes per spleen recognized this epitope. In another approach, the precursor frequency to the H-2D^{b}-restricted gp33–41 epitope of lymphocytic choriomeningitis virus (LCMV, epitope gp33) was estimated from the number of transferred LCMV-specific TCR transgenic CD8 T cells required to give a response equivalent to the endogenous response (4). The authors calculated that there were ∼100–200 CD8 T cells or 20 different clonotypes specific for epitope gp33 in the naive adult C57BL/6 (B6) mouse.

Recently, we used an alternative approach to determine the precursor frequency to a well-defined, immunodominant CD8 T cell epitope recognized in mice infected with mouse hepatitis virus (strain JHM) (7). Intranasal infection with this virus causes acute encephalitis with a robust T cell response (8). The CD8 T cell response in B6 mice is directed against an immunodominant and a subdominant epitope, encompassing residues 510–518 (epitope S510, H-2D^{b} restricted) and residues 598–605 (epitope S598, H-2K^{b} restricted) of the spike (S) glycoprotein (9, 10). To determine the number of epitope S510-specific CD8 T cells in the infected CNS, we isolated epitope-specific T cells with MHC class I/peptide tetramers and sequenced CDR3 regions from individual cDNA clones. The frequency of epitope S510-positive Vβ13-expressing clones fits a log-series distribution. We estimated that there were 300–500 epitope S510-specific CD8 T cell clonotypes present in the acutely infected CNS. This number is higher than estimated previously (4, 5), but is still probably an underestimate, because it does not consider redundancy in the T cell population or the contribution of the α-chain to TCR diversity.

To address the generality of our T cell diversity estimates and to examine the response in a secondary lymphoid tissue, we determined the number of clonotypes responding to epitopes S510 and S598 in JHM-infected mice and to epitope gp33 in LCMV-infected mice, after i.p. inoculation.

## Materials and Methods

### Mice

Specific pathogen-free B6 mice were purchased from the National Cancer Institute (Bethesda, MD). Animal studies were approved by the University of Iowa Animal Care and Use Committee.

### Virus

JHM was grown and titered, as described (11). Mice were infected i.p. with 1.5 × 10^{5} PFU JHM or 5 × 10^{5} PFU Armstrong strain of LCMV in PBS.

### IFN-γ capture assay

Cells were harvested from spleens of JHM-infected mice 7 days after infection. To isolate epitope S510- or S598-specific CD8 T cells, 1 × 10^{7} unfractionated splenocytes/ml were stimulated 1:50 with EL-4 cells for 4 h with 1 μM of S510 peptide (CSLWNGPHL) or S598 peptide (RCQIFANI) (12). After stimulation, epitope-specific CD8 T cells were captured using the MACS mouse IFN-γ secretion assay (Miltenyi Biotec, Auburn, CA) following the manufacturer’s protocol for frequencies of IFN-γ-secreting T cells <2%. Cells were additionally stained with FITC-conjugated anti-CD8 Abs. IFN-γ-expressing CD8 T cells were sorted with a FACS DiVa (BD Biosciences, San Jose, CA). A total of 50,000–220,000 epitope S510- or S598-specific cells was obtained from each sort. Under our sorting conditions, virtually no IFN-γ-expressing CD8 T cells were detected when cells were stimulated with no added peptide or when cells from uninfected mice were stimulated with JHM-specific peptide (Fig. 1).

### Staining with tetramers

Splenocytes harvested from LCMV-infected mice were stained with MHC class I/gp33 peptide (KAVYNFATM) tetrameric complexes (kindly provided by K. Messingham and J. Harty, University of Iowa), as described (7). A total of 100,000 tetramer gp33-positive CD8 cells was sorted with a FACS DiVa. As a negative control, cells were also exposed to irrelevant tetramer (specific for JHM epitope S510) (Fig. 1).

### Isolation of RNA from lymphocytes

RNA was isolated from lymphocytes using an RNeasy Mini Kit (Qiagen, Valencia, CA).

### Sequence analysis

cDNA was synthesized from each entire sorted lymphocyte RNA preparation in a 60 μl reaction, as described (13). A total of 5 μl of cDNA was used for PCR. For synthesis of Vβ5-, Vβ8-, and Vβ13-specific PCR products, Vβ5-specific (CCCAGCAGATTCTCAGTCCAAC), Vβ8-specific (CATGGGCTGAGGCTGATCCATT), and Vβ13-specific (CCTAAAGGAACTAACTCCACTCT) primers were used with a common Cβ reverse primer (GCAATCTCTGCTTTTGATGGCTC) and *Taq* polymerase (AmpliTaq Gold DNA Polymerase; Applied Biosystems, Foster City, CA). Amplification was conducted for 25 cycles, as described previously (13). The PCR error rate was 0.12% (2/1680, based on sequence analysis of conserved residues of the Vβ and Jβ chains). PCR products were cloned into plasmid vector pCR2.1-TOPO, as described by the manufacturer (Invitrogen, Carlsbad, CA). Colonies containing inserts were identified and sequenced with the Vβ primers above and an automated ABI Prism 3700 Sequencer (Applied Biosystems).

### TCR repertoire analysis

We fit log-series distributions to our clonotype diversity data. This distribution is often used to describe data with a few abundant and many rare species (14, 15). A log-series distribution has the form *s*_{i} = αx^{i}/*i*, where *s*_{i} is the number of CDR3 expected to be represented *i* times, α is an index of clonal diversity, and *x* is a parameter related to diversity and sample size (number of cDNA clones sequenced). The diversity index α depends on *S*, the number of different (at the nucleotide level) CDR3 species present in the T cell population, and *N*, the total number of T cells in the population: *S* = α ln(1 + *N*/α). α is large when the number of different CDR3 is high relative to the number of T cells. Our calculation of α followed *Krebs* (16) and was implemented in a computer program written in QuickBASIC 4.5 (Microsoft, Redmond, WA). Briefly, we estimated the parameter x using an iterative solution to the equation *S*/*N* = [(1 − *x*)/*x*][−ln(1 − *x*)], and then calculated our estimate of α as α = *N*(1 − *x*)/*x* (see Ref. 16 , equations 12.12 and 12.13). The expected number of clonotypes found in a larger sample of cells (say, an entire spleen) is easily calculated: *S*′ = α ln(1 + *N*′/α). Sample calculations are included as Appendix. SEs for α were obtained as described (17).

In a log-series distribution, the modal frequency is always 1 (that is, there are more clonotypes represented once than with any other integer frequency). However, this does not necessarily mean that most clonotypes are singletons, or even that most clonotypes are rare, because the total number of common clonotypes (summing, say, the frequencies of clonotypes represented 10 times, 11 times, 12 times… . . ) can be substantial. For instance, if S = 10 and *N* = 100 (10 distinct clonotypes in 100 sequenced cells), then ∼7 of the 10 clonotypes are expected to be present at least twice (Appendix). For each mouse, goodness-of-fit of the data to a log series was assessed using a *G* test with Williams’ correction (18), using observed and expected counts for five frequency classes (sequences represented once, 2–3 times, 4–7 times, 8–15 times, and 16 or more times; but results were robust to different lumpings). Calculation of expected frequencies is illustrated in Appendix. We similarly tested goodness-of-fit to uniform frequency distributions. To increase statistical power, we combined the test results across all data sets using Fisher’s method (17) to yield a single, powerful test of each distributional hypothesis.

## Results

### Detection of epitope S510-specific T cells

To extend our studies to a second CD8 T cell epitope, we used an IFN-γ capture assay to isolate epitope S598-specific T cells from the spleens of mice after inoculation with JHM. Previously, we isolated epitope S510-specific CD8 T cells with MHC class I/peptide tetramers (7). However, epitope S598 had low functional avidity (amount of peptide required to sensitize target cells for lysis) (10) and consistent with this, epitope S598-specific tetramers rapidly dissociated from T cells, resulting in poor cell recovery after cell sorting. In preliminary experiments, intact RNA suitable for RT-PCR was not reproducibly isolated after detection of epitope-specific cells by conventional intracellular IFN-γ assay. The IFN-γ capture assay does not require fixation of cells, and so allows isolation of high quality RNA. We measured precursor frequencies in the spleen after i.p. infection, in part because cell recovery from the CNS was low after IFN-γ capture, and in part to document the repertoire in a secondary lymphoid organ. In initial experiments, we detected the same percentage of Ag-specific T cells with the IFN-γ capture as with intracellular IFN-γ-staining assays. We also infected mice with 5 × 10^{4}, 1.5 × 10^{5}, or 5 × 10^{5} PFU of JHM and found that 1.5 × 10^{5} PFU gave a maximal T cell response.

Initially, we measured the number of CD8 T cell clonotypes responding to epitope S510 and compared it with our previous results for cells isolated from the infected CNS. Epitope S510-specific T cells were sorted by flow cytometry and Vβ13-expressing cells were analyzed further because these cells were most abundant in the epitope S510 population (7). Under our conditions of cell sorting, virtually no cells were IFN-γ^{+} in the absence of peptide S510 stimulation (Fig. 1). We analyzed 52 and 58 cDNA clones from each of two mice, with 29 and 31 different CDR3 detected; 17 and 22 of these were identified in single clones (Table I). As in the CNS, the response was polyclonal with diverse Jβ usage, but preferential usage of Jβ2.1 and Jβ2.6 (Supplementary Table I)^{4} (7). Only one CDR3 was common to both mice, but 4 of 60 of these spleen-derived CDR3 were previously detected in the CNS (7) (Supplementary Table I).

Mouse . | Vβ . | No. Analyzed^{a}
. | No. Species^{b}
. | Abundance (no. copies)^{c}
. | . | . | . | . | . | . | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

. | . | . | . | 1 . | 2 . | 3 . | 4 . | 5–7 . | 8–15 . | >16 . | ||||||

JHM S510-1 | Vβ13 | 58 | 29 | 17 | 8 | 2 | 2 | 2 | 0 | 0 | ||||||

S510-1 | Vβ8 | 66 | 31 | 22 | 4 | 2 | 0 | 1 | 1 | 1 | ||||||

S510-2 | Vβ13 | 52 | 31 | 22 | 4 | 3 | 2 | 1 | 0 | 0 | ||||||

JHM S598-1 | Vβ5 | 49 | 29 | 16 | 7 | 1 | 1 | 2 | 0 | 0 | ||||||

S598-2 | Vβ5 | 51 | 29 | 19 | 7 | 0 | 0 | 1 | 1 | 0 | ||||||

S598-3 | Vβ5 | 56 | 23 | 15 | 2 | 1 | 2 | 1 | 2 | 0 | ||||||

LCMV gp33-1 | Vβ8 | 85 | 40 | 21 | 10 | 3 | 2 | 3 | 1 | 0 | ||||||

gp33-2 | Vβ8 | 89 | 45 | 26 | 7 | 4 | 2 | 3 | 1 | 0 |

Mouse . | Vβ . | No. Analyzed^{a}
. | No. Species^{b}
. | Abundance (no. copies)^{c}
. | . | . | . | . | . | . | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

. | . | . | . | 1 . | 2 . | 3 . | 4 . | 5–7 . | 8–15 . | >16 . | ||||||

JHM S510-1 | Vβ13 | 58 | 29 | 17 | 8 | 2 | 2 | 2 | 0 | 0 | ||||||

S510-1 | Vβ8 | 66 | 31 | 22 | 4 | 2 | 0 | 1 | 1 | 1 | ||||||

S510-2 | Vβ13 | 52 | 31 | 22 | 4 | 3 | 2 | 1 | 0 | 0 | ||||||

JHM S598-1 | Vβ5 | 49 | 29 | 16 | 7 | 1 | 1 | 2 | 0 | 0 | ||||||

S598-2 | Vβ5 | 51 | 29 | 19 | 7 | 0 | 0 | 1 | 1 | 0 | ||||||

S598-3 | Vβ5 | 56 | 23 | 15 | 2 | 1 | 2 | 1 | 2 | 0 | ||||||

LCMV gp33-1 | Vβ8 | 85 | 40 | 21 | 10 | 3 | 2 | 3 | 1 | 0 | ||||||

gp33-2 | Vβ8 | 89 | 45 | 26 | 7 | 4 | 2 | 3 | 1 | 0 |

Number of cDNA clones analyzed per mouse.

Number of different cDNA clones identified.

Number of cDNA clones within each abundance class.

As for CNS-derived cells (7, 19), frequencies of splenic epitope S510-specific Vβ13-expressing CD8 T cell clonotypes fit a log-series distribution well (Fig. 2, Table II; in no case could a log series be rejected at *p* = 0.05). For each sample, α, a measure of CDR3 diversity, was calculated and used to estimate the abundance of TCR Vβ13 cDNA clonotypes per 10,000 CD8 T cells and per spleen (Table II). These data can be extrapolated to the entire population of epitope S510-specific CD8 T cells, assuming that each Vβ subrepertoire is similarly diverse. Because ∼22% of epitope S510-specific T cells express Vβ13 (7), 1100–1500 different clonotypes per spleen must have recognized epitope S510. This calculation assumes that diversity is uniformly distributed among Vβ families. To address this directly, we performed similar analyses on Vβ8-expressing, epitope S510-specific T cells using RNA harvested from mouse S510-1 (Tables I and II). This data set, considered by itself, differed just significantly from log series (*p* = 0.043), with the deviation being in the direction of too many rare clonotypes (that is, the clonotype frequency distribution is even more uneven than log series). However, the apparent significance of this deviation should not be overinterpreted, because we conducted eight simultaneous goodness-of-fit tests (Table II). The probability of at least one type I error (false positive) in eight tests is not negligible (∼33%), and if we had applied a sequential Bonferroni adjustment (20) to correct for the multiple tests, the deviation of the Vβ8 data set from log series would not have reached significance (data not shown). Assuming that a log-series distribution is not inappropriate for these data, we calculated that there were ∼206 Vβ8-expressing CD8 T cell clonotypes per spleen. Because Vβ8-expressing cells comprise 14% of epitope S510-specific CD8 T cells, this extrapolates to 1471 different epitope S510-specific clonotypes per spleen, in excellent agreement with the number calculated from the Vβ13-specific population. By comparison, we previously calculated (7) that there were 350–500 different S510-specific clonotypes within the infected CNS. This difference arose in part because there are more epitope S510-specific CD8 T cells in the spleen than in the CNS, but the number of different cDNA clones per 10,000 epitope S510-specific T cells was also higher by 2- to 3-fold.

Mouse . | Vβ . | Vβ13, Vβ5, or Vβ8 Epitope-Specific Cells/Spleen (no.) . | α(P)^{a}
. | Frequency of Vβ13, Vβ5, or Vβ8 clonotypes (per 10,000)^{b}
. | Total No. of Vβ13, Vβ5, Vβ8 Clonotypes/Infected Spleen (CI)^{b}
. | Total No. of Clonotypes/Spleen^{c}
. |
---|---|---|---|---|---|---|

JHM S510-1 | Vβ13 | 310,000 | 27.08 (0.60) | 160 | 253 (170–333) | 1,150 |

Vβ8 | 197,000 | 22.80 (0.043) | 139 | 206 (139–271) | 1,471 | |

S510-2 | Vβ13 | 438,000 | 35.42 (0.73) | 200 | 334 (226–437) | 1,518 |

JHM S598-1 | Vβ5 | 160,000 | 24.70 (0.87) | 148 | 217 (141–289) | 775 |

S598-2 | Vβ5 | 350,000 | 26.27 (0.23) | 156 | 249 (163–332) | 889 |

S598-3 | Vβ5 | 1,030,000 | 14.59 (0.24) | 95 | 163 (100–224) | 582 |

LCMV gp33–1 | Vβ8 | 1,280,000 | 29.5 (0.95) | 172 | 315 (223–404) | 1,050 |

gp33-2 | Vβ8 | 2,510,000 | 32.7 (0.78) | 187 | 368 (264–469) | 1,227 |

Mouse . | Vβ . | Vβ13, Vβ5, or Vβ8 Epitope-Specific Cells/Spleen (no.) . | α(P)^{a}
. | Frequency of Vβ13, Vβ5, or Vβ8 clonotypes (per 10,000)^{b}
. | Total No. of Vβ13, Vβ5, Vβ8 Clonotypes/Infected Spleen (CI)^{b}
. | Total No. of Clonotypes/Spleen^{c}
. |
---|---|---|---|---|---|---|

JHM S510-1 | Vβ13 | 310,000 | 27.08 (0.60) | 160 | 253 (170–333) | 1,150 |

Vβ8 | 197,000 | 22.80 (0.043) | 139 | 206 (139–271) | 1,471 | |

S510-2 | Vβ13 | 438,000 | 35.42 (0.73) | 200 | 334 (226–437) | 1,518 |

JHM S598-1 | Vβ5 | 160,000 | 24.70 (0.87) | 148 | 217 (141–289) | 775 |

S598-2 | Vβ5 | 350,000 | 26.27 (0.23) | 156 | 249 (163–332) | 889 |

S598-3 | Vβ5 | 1,030,000 | 14.59 (0.24) | 95 | 163 (100–224) | 582 |

LCMV gp33–1 | Vβ8 | 1,280,000 | 29.5 (0.95) | 172 | 315 (223–404) | 1,050 |

gp33-2 | Vβ8 | 2,510,000 | 32.7 (0.78) | 187 | 368 (264–469) | 1,227 |

Values of *p* from a *G* test for goodness-of-fit to the estimated log-series distribution; *p* < 0.05 indicates significant departure of the data from the log-series distribution.

Numbers assume fit to the estimated log-series distributions. CI, 95% confidence interval.

### Diversity of splenic CD8 T cell responses to epitope S598

Vβ5-expressing cells were analyzed because they made up ∼28% of epitope S598-specific cells. Within this population, Jβ usage was diverse, with no Jβ1 or Jβ2 elements used preferentially (Supplementary Table II). Clonotype frequencies fit a log-series distribution well (Table II). As described above, we calculated that there were 160–250 Vβ5-expressing epitope S598-specific CD8 T cell clonotypes per infected spleen, or 600–900 total S598-specific clonotypes per spleen (Table II). Six of 75 clonotypes were present in more than one mouse. We also attempted to analyze the epitope S598-specific T cell response in the CNS of mice with acute encephalitis using MHC class I tetramers. Despite poor cell recovery (see above), we were able to identify a limited number of CDR3 (∼28 sequences from 5 brains). Three CDR3 were common to both splenic and CNS CD8 T cell samples. No CDR3 were common to the epitope S510- and S598-specific CD8 T cell populations.

### Diversity of CD8 T cell response to LCMV epitope gp33

Blattman et al. (4) estimated that the precursor frequency to the LCMV-specific H-2D^{b}-restricted gp33 epitope was ∼20 clonotypes per mouse, 1–2 orders of magnitude lower than we calculated for JHM-specific epitopes. Therefore, we determined the epitope gp33-specific precursor frequency using MHC class I (H2-D^{b})/peptide tetramers to identify gp33-specific CD8 T cells in spleens of LCMV-infected mice (Fig. 1, *D* and *E*). Tetramers were used because epitope gp33 is presented by both H-2D^{b} and H-2K^{b} Ag (21) and the IFN-γ capture assay would detect both responses. We sequenced CDR3 from Vβ8-expressing cells (comprising 30% of the H-2D^{b}-restricted tetramer gp33-positive cells (22)) and compared our results directly with those of Blattman et al. Clonotype frequencies fit a log-series distribution well (Table II). There were 1000–1200 different H-2D^{b}-restricted epitope gp33-specific T cell clonotypes in LCMV-infected mice (Table II; Supplementary Table III). This number is very similar to the number of clonotypes recognizing epitopes S510 or S598 in JHM-infected mice. The Jβ1.1 element was used by ∼50% of epitope gp33-specific T cell clonotypes, and aspartic acid was the first residue in a majority of the CDR3. In several cases, distinct nucleotide sequences encoded the same CDR3; this redundancy was not as common in the epitope S510- or S598-specific CDR3.

### Testing appropriateness of log-series and uniform frequency distributions

Even after combining the results of goodness-of-fit tests across all data sets, we cannot reject the hypothesis of adequate fit to log-series distributions (Fisher’s method, χ^{2} = 14.6, df = 16, *p* = 0.55). In contrast, we can strongly reject the hypothesis that clonotype frequencies follow a uniform distribution, as applied by other workers (4, 23). Across all data sets (Table II), clonotype frequencies differ strongly and significantly from expected under the uniform-distribution hypothesis (Fisher’s method, χ^{2} = 55.2, df = 16, *p* < 0.0001).

### Effect of sequencing additional cDNA clones on SE of α

In our analyses, we sequenced 50–100 cDNA clones and inferred the number of sequences from a fitted log series. Because this is a small fraction of the total number of epitope-specific CD8 T cells in the spleen, we estimated the likely improvement in our diversity estimates had we sequenced more clones, assuming that the underlying clonal diversity really is log series with α, as estimated. Benefits of further sequencing were modest: sequencing 1000 clones (vs 50) would decrease the SE of α only by a factor of 2 (Table III).

Assumed No. Sequenced . | Inferred Clonotypes/Spleen . | Inferred Clonotypes/10,000 Cells . |
---|---|---|

50 | 334 (138–439)^{b} | 200 (138–257) |

100 | 334 (246–419) | 200 (151–246) |

200 | 334 (260–406) | 200 (159–239) |

300 | 334 (266–400) | 200 (162–236) |

400 | 334 (270–396) | 200 (164–234) |

500 | 334 (272–394) | 200 (166–233) |

1000 | 334 (279–388) | 200 (169–230) |

Assumed No. Sequenced . | Inferred Clonotypes/Spleen . | Inferred Clonotypes/10,000 Cells . |
---|---|---|

50 | 334 (138–439)^{b} | 200 (138–257) |

100 | 334 (246–419) | 200 (151–246) |

200 | 334 (260–406) | 200 (159–239) |

300 | 334 (266–400) | 200 (162–236) |

400 | 334 (270–396) | 200 (164–234) |

500 | 334 (272–394) | 200 (166–233) |

1000 | 334 (279–388) | 200 (169–230) |

Based on recalculation of SE, assuming same estimate of α and sequencing a larger number of cDNA clones.

Numbers in parentheses are an approximate 95% confidence interval (estimate ± 2 SE).

## Discussion

Our analyses suggest that the number of CD8 T cell clonotypes in the naive mouse able to respond to a single epitope is substantially greater than previously reported. It is unlikely that our approach much overestimated precursor frequency. The IFN-γ capture assay detects cells that secrete IFN-γ in response to specific peptides, and not cells that secrete, instead, cytokines such as IL-4 or IL-10. Our assay depended on T cell activation, which could bias the results if transcription rates of TCR chain mRNA were very different among epitope-specific T cells; however, this is apparently not so (24, 25). Finally, our calculations do not consider redundancy in T cells expressing the same CDR3 RNA sequence, the contribution of α-chain usage to diversity, or the possibility that different sets of T cells might be activated after i.p. vs intranasal or intracranial inoculation. These caveats make it likely that we actually underestimated precursor frequencies.

It is unlikely that many of our sequences correspond to cellular or *Taq* polymerase-induced sequencing contaminants. First, gating was based on stringent criteria, with virtually no cells detected in the absence of specific peptide or peptide/tetramer, or in naive animals (Fig. 1). Second, the vast majority of epitope-specific sequences differed by more than 1 nt, except for the gp33-specific Vβ8-Jβ1.1 population. Because CDR3 were no more than 36 nt in length (Supplementary Tables I–III) and the *Taq* polymerase error rate was 0.12%, it is unlikely that these 1-nt differences in the gp33-specific Vβ8-Jβ1.1 population resulted from polymerase errors. It is more likely that these closely related sequences define a motif characteristic of this population and thereby serve as an indirect control for specificity.

Distributions of splenic epitope-specific CD8 T cells were well described by log series. Assuming that log-series distributions can be extrapolated to include clonotypes present at very low copy number, we predict that, for Vβ8-expressing epitope gp33-specific CD8 T cells, 200 clonotypes will be very rare (<0.3% of the total epitope-specific cells per spleen), with 20–40 represented by single cells (illustrated in Fig. 2, *D* and *E*, for mouse gp33-2). Proving this directly would entail sequencing more cDNA clones than is feasible, so our estimates must be considered approximations to the number of different CDR3 actually present. However, it is not plausible that low-frequency clonotypes are entirely absent. Goodness-of-fit tests consistently rejected a uniform-distribution hypothesis for our clonotype diversity data, and this means that the likelihood of sampling clonotype frequency distributions like ours from T cell populations without substantial numbers of rare clonotypes was negligible. Of course, we cannot reject the possibility that the true T cell clonotype-frequency distribution is not truly log series, but merely log series-like with similar numbers of rare clonotypes; for this reason too, we emphasize that our estimates of CDR3 diversity are approximations.

The log-series distribution provides a good empirical fit for our data, but the biological mechanism for this distribution is not currently understood. Differences in the number of precursor cells for each clonotype in the naive population or in the duration of Ag exposure may contribute to the development of a log-series distribution. Although recent publications suggest that, once activated, T cells are committed to an extensive proliferative program, exceeding eight cellular divisions (26, 27, 28), differences in proliferative potential past eight divisions would result in a nonuniform distribution of T cell clonotypes. A nonuniform distribution would also result if T cell proliferation and differentiation were dependent upon the cumulative strength of the signal resulting from exposure to APCs and cytokines, as has also been postulated (29).

We detected limited commonality in CDR3 usage between different animals. Only 1 of 59, 6 of 72, and 6 of 59 of epitopes S510-, S598-, and gp33-specific T cell clonotypes (at the amino acid sequence level) were detected in more than one animal. These numbers agree with our previous study of epitope S510-specific CD8 T cells in CNS of mice with acute encephalitis (7) and with other studies showing T cell clonotypes recognizing individual epitopes to be largely nonoverlapping among animals (5, 30).

In previous reports, precursor frequencies to single epitopes were estimated at ∼200 cells per spleen, or 20 different clonotypes (4, 5, 6), much lower than our estimates. We suspect that different assumptions about clonotype diversity distributions largely account for the discordant results. In one study, Bousso et al. (5) analyzed naive and immune repertoires of CD8 T cells responding to epitope A2–170 (H-2K^{d} restricted). However, their analysis differed from ours in that they assumed they had sequenced all clonotypes in the spleen (i.e., assumed a truncated frequency distribution with no unobserved clonotypes). Because approximately one-third of their clonotypes were observed only once, unobserved clonotypes were not unlikely. In addition, the HLA-A2-derived peptide differs from a homologous sequence in the H-2K^{d} Ag by only 3 aa, which may contribute to a limited CD8 T cell response. In the second study, the precursor frequency of CD8 T cells responding to the LCMV gp33 epitope was estimated using transfer of transgenic T cells (4). However, the estimated clonotype diversity was depressed by the assumption that equal numbers of marker and endogenous T cells after exposure to Ag imply equal numbers of precursor cells. Because the marker T cells were monoclonal, that assumption will fail if the frequency distribution of endogenous T cell clonotypes is nonuniform. Our data (and see also Ref. 5) are clearly inconsistent with a uniform distribution. If the actual distribution were approximately log series (resembling our data and Ref. 5), the calculated number of precursors would be much higher and closer to our estimates (Table II).

Recently, Gorski and colleagues (31) showed that the distribution of memory human CD8 T cells recognizing an influenza A epitope included, like the distributions reported in this work, a large number of low-frequency clonotypes, which could be described by a power law-like distribution. When we fit these data to a log-series distribution, two of three data sets were well described, whereas for the third the number of clonotypes represented only once was somewhat underestimated. Although both methodologies describe the data, the log-series distribution is simpler and does not require that the data be separated into high- and low-frequency clonotypes before analysis.

Assuming that there are 2–5 × 10^{5} different CD8 T cell clonotypes in the naive mouse spleen, our data suggest that only 2–5 × 10^{2} different antigenic epitopes could be recognized, if each TCR clonotype recognizes a single epitope. This is much lower than previous estimates and suggests that T cells must exhibit extensive cross-reactivity for the immune system to respond to a large number of Ags (32). These data are also consistent with reports highlighting the existence of allogeneic CD8 T cell responses after viral challenge and of T cells generated in response to one virus, but able to recognize a second, heterologous virus (33).

## Appendix

### Sample calculation of log-series diversity

Imagine that 100 cells were sequenced, yielding a total of 10 distinct sequences, or clonotypes (3 clonotypes represented once; 1, twice; 1, 3 times; 1, 6 times; 1, 10 times; 1, 17 times; 1, 23 times; and 1, 36 times). The log-series fit uses only the total number of cells (*N* = 100) and of distinct sequences (S = 10). Iterative solution of the equation *S*/*N* = [(1 − *x*)/*x*][−ln(1 − *x*)] gives *x* = 0.9730825, and using α = *N*(1 − *x*)/*x*, we find α = 2.7663. We can test the adequacy of the log-series fit to these data by calculating expected frequencies, using form *s*_{i} = αx^{i}/*i*: in this case, we find *s*_{1} = 2.69, *s*_{2} = 1.31, *s*_{3} = 0.85, and so on. Notice that 73% of clonotypes are represented multiple times, and 27% are singletons. These expected frequencies (and also the observed frequencies) are then grouped into classes (1 time, 2–3 times, 4–7 times, 8–15 times, 16+ times) and goodness-of-fit assessed using standard *G* tests (in this example, the fit is excellent, with *G* = 0.64, *p* > 0.85). Finally, the expected number of clonotypes found in a larger sample of cells is given by *S*′ = α ln(1 + *N*′/α): for example, if 1000 cells were sequenced, *S*′ = 16.3 distinct clonotypes would be expected.

## Acknowledgements

We thank Dr. V. Badovinac for splenocytes from LCMV-infected mice, Dr. K. Messingham for providing MHC class I/peptide gp33 tetramer, and Dr. J. Harty for critical review of the manuscript.

## Footnotes

This research was supported by grants from the National Institutes of Health (NS40438) and National Multiple Sclerosis Society (RG 2864-B-3).

Abbreviations used in this paper: CDR3, complementarity-determining region 3; JHM, mouse hepatitis virus; LCMV, lymphocytic choriomeningitis virus; S, surface glycoprotein.

The on-line version of this article contains supplemental material.

## References

^{+}splenocytes selected on nonpolymorphic MHC class I molecules.

^{b}-restricted CTL response.

^{+}T cell epitopes within the surface glycoprotein of a neurotropic coronavirus and correlation with pathogenicity.

^{+}T cell differentiation: initial antigen encounter triggers a developmental program in naive cells.

^{+}T cell response.

^{+}memory T cell repertoire could optimize potential for immune responses.