Abstract
Hepatitis C virus (HCV) vaccine efficacy may crucially depend on immunogen length and coverage of viral sequence diversity. However, covering a considerable proportion of the circulating viral sequence variants would likely require long immunogens, which for the conserved portions of the viral genome, would contain unnecessarily redundant sequence information. In this study, we present the design and in vitro performance analysis of a novel “epitome” approach that compresses frequent immune targets of the cellular immune response against HCV into a shorter immunogen sequence. Compression of immunological information is achieved by partial overlapping shared sequence motifs between individual epitopes. At the same time, sequence diversity coverage is provided by taking advantage of emerging cross-reactivity patterns among epitope variants so that epitope variants associated with the broadest variant cross-recognition are preferentially included. The processing and presentation analysis of specific epitopes included in such a compressed, in vitro-expressed HCV epitome indicated effective processing of a majority of tested epitopes, although re-presentation of some epitopes may require refined sequence design. Together, the present study establishes the epitome approach as a potential powerful tool for vaccine immunogen design, especially suitable for the induction of cellular immune responses against highly variable pathogens.
Hepatitis C virus (HCV)3 is estimated to have infected >170 million individuals worldwide. The host’s immune response fails in 70–80% of cases to clear the virus, leading to chronic hepatitis C infection. Although IFN-α-based therapy can be effective, it is expensive, associated with severe side effects, and is least successful in treating some of the most frequent genotypes (i.e., 1a, 1b) in some regions (reviewed in Ref. 1). Given these limitations, the only promising and likely only sustainable approach to control the HCV pandemic will be the development of an effective vaccine. While the role of neutralizing Abs in HVC control is unclear (2, 3), strong and broadly directed T cell responses have consistently been associated with viral clearance, both in human HCV infection as well as in the chimpanzee model (4, 5, 6). However, a likely important hurdle for the design of an effective and widely applicable vaccine is the extensive global sequence diversity of HCV. This high genetic variability is also thought to play an important role in the outcome of HCV infection and viral immune evasion (7, 8). In particular, mutations within targeted CTL epitopes causing reduced binding to HLA class I molecules and/or impaired TCR recognition (9, 10), as well as mutations in flanking regions of presented epitopes that abrogate effective epitope processing have been reported in the past (11).
Aside from impacting host control of HCV infection, the viral heterogeneity also complicates the reliable in vitro detection of cellular immune responses to this pathogen. In the past, such analyses have mostly been based on consensus sequences or genotype reference sequence-based test reagents, unavoidably leading to an underestimation of the total response rates to this virus. As seen for HIV, this may especially affect regions of the viral genome that are characterized by higher entropy, as in these cases the test reagent and the autologous viral sequence are on average the most divergent (12, 13, 14). As a consequence, the HCV-specific CTL response patterns described to date may be considerably impacted by the use of one or the other test reagent, severely limiting the rational selection of appropriate epitope candidates or regions of the viral proteome for vaccine immunogen design. In this report, we show the potential usefulness of a novel approach for the condensation of experimentally confirmed, relevant immunological information in immunogen sequences for vaccine design. This strategy, referred to as “epitome” determines viral sequence diversity and T cell cross-reactivity to identify frequently targeted, widely cross-reactive CTL epitopes, which are then condensed into a short epitome of potential high immunogenicity (15). In this proof of concept study, we show the design of a HCV-specific epitome and demonstrate that defined epitopes can be successfully processed from an in vitro-expressed epitome and presented in the context of the restricting HLA class I molecule.
Materials and Methods
Study subjects
A total of 40 individuals positive for anti-HCV Abs were enrolled in this study. All HIV-HCV-coinfected patients (n = 34) had CD4 T cell counts of >250 cells/μl to increase the chance of detecting HCV-specific CTL activities (16). Six healthy individuals, negative for anti-HCV and anti-HIV Abs, were included as negative controls. HLA class I typing was performed for 30 of the 46 tested subjects using standard SS-PCR techniques (Table I).
. | Patient Identifier . | HCV Status . | HCV Viral Load . | Infecting Genotype . | CD4 Count . | HLA-A . | HLA-B . | HLA-C . |
---|---|---|---|---|---|---|---|---|
1 | L8 24 | Chronic | 890,000 | 1a | 418 | 11/68 | 14/1401 | 08/08 |
2 | L8 100 | Clearer | <5 | ND | 391 | 01/23 | 0801/1,516 | 07/14 |
3 | L8 07 | Chronic | 111,000 | 1a | 395 | 02/31 | 35/51 | 04/1601 |
4 | L8 36 | Chronic | 1,190,000 | 1a | 483 | 26/33 | 38/4,901 | 07/12 |
5 | L8 96 | Chronic | ND | 1b | 392 | 74/8001 | 18/57 | 02/07 |
6 | L8 236 | Clearer | <5 | 1a | 470 | 01/02 | 37/51 | 06/16 |
7 | L8139 | Chronic | ND | 1a | 319 | 0201/1101 | 3501/4001 | 0301/0403 |
8 | L8 133 | Clearer | <5 | 1a | 311 | 22/66 | 40/44 | 04/08 |
9 | L8 175 | Chronic | 1,380,000 | 1b | 380 | 02/30 | 1,516/4,201 | 14/17 |
10 | L8 170 | Chronic | 7,160 | 1a | 309 | 02/33 | 15/44 | 02/16 |
11 | L8 28 | Chronic | 10,800 | 3a | 389 | 31/39 | 44/05 | 12 |
12 | F7 18 | Chronic | ND | 1b | 296 | 31/33 | 14/40 | 0304/0802 |
13 | F7106 | Chronic | ND | 1a | 407 | 02/03 | 41/45 | 16/17 |
14 | L8 23 | Chronic | ND | 1a | 468 | 23/32 | 18/8,101 | 07/08 |
15 | L8 117 | Chronic | ND | 1a | 443 | 01/02 | 44/57 | 05/06 |
16 | L8 118 | Chronic | 2,940,000 | 1a | 385 | 02/03 | 27/51 | 02/16 |
17 | L8 155 | Chronic | ND | 3a | 469 | 02/02 | 44/4901 | 05/07 |
18 | L8 156 | Clearer | <5 | 1a | 486 | 1101/1101 | 3501/5101 | 0401/1502 |
19 | L8 157 | Chronic | 51,400 | 3a | 363 | 01/11 | 35/55 | 03/1402 |
20 | L8 180 | Clearer | ND | 1b | 486 | 0201/0202 | 3501/4501 | 1402/1601 |
21 | L8 47 | Clearer | ND | 1b | 590 | 02/11 | 38/46 | 01/0702 |
22 | L8 79 | Clearer | ND | 1a | 878 | 02/32 | 1518/44 | 05/07 |
23 | L8 198 | Chronic | ND | 1b | 1200 | ND | ND | ND |
24 | L8 213 | Chronic | ND | ND | HIV- | ND | ND | ND |
25 | L8 214 | Chronic | ND | ND | HIV- | ND | ND | ND |
26 | L8 217 | Chronic | 10,400,000 | ND | HIV- | 0101/3010 | 4101/5701 | 0602/0701 |
27 | F7 120 | Chronic | ND | 1a | 726 | 02/11 | 35/41 | 04/06 |
28 | F7 148 | Chronic | ND | 1b | 387 | 02/36 | 27/53 | 04/15 |
29 | F7 150 | Chronic | ND | 3a | 542 | 01/23 | 08/57 | 06/07 |
30 | L8 230 | Chronic | 700,000 | 1a | 390 | 02/30 | 15/42 | 14/17 |
31 | L8 205 | Chronic | ND | 1a | 317 | 1101/3101 | 3905/5101 | 0202/0702 |
32 | L8 271 | Chronic | 3,030,000 | 2b | 404 | ND | ND | ND |
33 | L8 62 | Clearer | <5 | ND | 497 | 02/03 | 35/4901 | 07/1601 |
34 | P53 | Chronic | ND | 1b | HIV- | 02/31 | 49/62 | 03/07 |
35 | P84 | Chronic | ND | 1b | HIV- | 02/02 | 12/15 | 01/03 |
36 | P86 | Chronic | ND | 1b | HIV- | 02/32 | 07/53 | 04/07 |
37 | L8 29 | Chronic | 1,710,000 | ND | 273 | 02/33 | 35/57 | 04/07 |
38 | L8 50 | Chronic | 1,490,000 | 1b | 268 | 03/24 | 44 | 1403/1601 |
39 | L8 257 | Clearer | <5 | ND | 364 | ND | ND | ND |
40 | L8 282 | Clearer | ND | 2b | 308 | ND | ND | ND |
41 | C-1 | Healthy ctrl | HCV− | — | HIV− | ND | ND | ND |
42 | C-2 | Healthy ctrl | HCV− | — | HIV− | ND | ND | ND |
43 | C-3 | Healthy ctrl | HCV− | — | HIV− | 24/26 | 40 | 02/15 |
44 | C-4 | Healthy ctrl | HCV− | — | HIV− | ND | ND | ND |
45 | C-5 | Healthy ctrl | HCV− | — | HIV− | 0102/02 | 51/57 | 06715 |
46 | C-6 | Healthy ctrl | HCV− | — | HIV− | ND | ND | ND |
. | Patient Identifier . | HCV Status . | HCV Viral Load . | Infecting Genotype . | CD4 Count . | HLA-A . | HLA-B . | HLA-C . |
---|---|---|---|---|---|---|---|---|
1 | L8 24 | Chronic | 890,000 | 1a | 418 | 11/68 | 14/1401 | 08/08 |
2 | L8 100 | Clearer | <5 | ND | 391 | 01/23 | 0801/1,516 | 07/14 |
3 | L8 07 | Chronic | 111,000 | 1a | 395 | 02/31 | 35/51 | 04/1601 |
4 | L8 36 | Chronic | 1,190,000 | 1a | 483 | 26/33 | 38/4,901 | 07/12 |
5 | L8 96 | Chronic | ND | 1b | 392 | 74/8001 | 18/57 | 02/07 |
6 | L8 236 | Clearer | <5 | 1a | 470 | 01/02 | 37/51 | 06/16 |
7 | L8139 | Chronic | ND | 1a | 319 | 0201/1101 | 3501/4001 | 0301/0403 |
8 | L8 133 | Clearer | <5 | 1a | 311 | 22/66 | 40/44 | 04/08 |
9 | L8 175 | Chronic | 1,380,000 | 1b | 380 | 02/30 | 1,516/4,201 | 14/17 |
10 | L8 170 | Chronic | 7,160 | 1a | 309 | 02/33 | 15/44 | 02/16 |
11 | L8 28 | Chronic | 10,800 | 3a | 389 | 31/39 | 44/05 | 12 |
12 | F7 18 | Chronic | ND | 1b | 296 | 31/33 | 14/40 | 0304/0802 |
13 | F7106 | Chronic | ND | 1a | 407 | 02/03 | 41/45 | 16/17 |
14 | L8 23 | Chronic | ND | 1a | 468 | 23/32 | 18/8,101 | 07/08 |
15 | L8 117 | Chronic | ND | 1a | 443 | 01/02 | 44/57 | 05/06 |
16 | L8 118 | Chronic | 2,940,000 | 1a | 385 | 02/03 | 27/51 | 02/16 |
17 | L8 155 | Chronic | ND | 3a | 469 | 02/02 | 44/4901 | 05/07 |
18 | L8 156 | Clearer | <5 | 1a | 486 | 1101/1101 | 3501/5101 | 0401/1502 |
19 | L8 157 | Chronic | 51,400 | 3a | 363 | 01/11 | 35/55 | 03/1402 |
20 | L8 180 | Clearer | ND | 1b | 486 | 0201/0202 | 3501/4501 | 1402/1601 |
21 | L8 47 | Clearer | ND | 1b | 590 | 02/11 | 38/46 | 01/0702 |
22 | L8 79 | Clearer | ND | 1a | 878 | 02/32 | 1518/44 | 05/07 |
23 | L8 198 | Chronic | ND | 1b | 1200 | ND | ND | ND |
24 | L8 213 | Chronic | ND | ND | HIV- | ND | ND | ND |
25 | L8 214 | Chronic | ND | ND | HIV- | ND | ND | ND |
26 | L8 217 | Chronic | 10,400,000 | ND | HIV- | 0101/3010 | 4101/5701 | 0602/0701 |
27 | F7 120 | Chronic | ND | 1a | 726 | 02/11 | 35/41 | 04/06 |
28 | F7 148 | Chronic | ND | 1b | 387 | 02/36 | 27/53 | 04/15 |
29 | F7 150 | Chronic | ND | 3a | 542 | 01/23 | 08/57 | 06/07 |
30 | L8 230 | Chronic | 700,000 | 1a | 390 | 02/30 | 15/42 | 14/17 |
31 | L8 205 | Chronic | ND | 1a | 317 | 1101/3101 | 3905/5101 | 0202/0702 |
32 | L8 271 | Chronic | 3,030,000 | 2b | 404 | ND | ND | ND |
33 | L8 62 | Clearer | <5 | ND | 497 | 02/03 | 35/4901 | 07/1601 |
34 | P53 | Chronic | ND | 1b | HIV- | 02/31 | 49/62 | 03/07 |
35 | P84 | Chronic | ND | 1b | HIV- | 02/02 | 12/15 | 01/03 |
36 | P86 | Chronic | ND | 1b | HIV- | 02/32 | 07/53 | 04/07 |
37 | L8 29 | Chronic | 1,710,000 | ND | 273 | 02/33 | 35/57 | 04/07 |
38 | L8 50 | Chronic | 1,490,000 | 1b | 268 | 03/24 | 44 | 1403/1601 |
39 | L8 257 | Clearer | <5 | ND | 364 | ND | ND | ND |
40 | L8 282 | Clearer | ND | 2b | 308 | ND | ND | ND |
41 | C-1 | Healthy ctrl | HCV− | — | HIV− | ND | ND | ND |
42 | C-2 | Healthy ctrl | HCV− | — | HIV− | ND | ND | ND |
43 | C-3 | Healthy ctrl | HCV− | — | HIV− | 24/26 | 40 | 02/15 |
44 | C-4 | Healthy ctrl | HCV− | — | HIV− | ND | ND | ND |
45 | C-5 | Healthy ctrl | HCV− | — | HIV− | 0102/02 | 51/57 | 06715 |
46 | C-6 | Healthy ctrl | HCV− | — | HIV− | ND | ND | ND |
ctrl, Control.
HCV sequence data
Amino acid sequences for positions 1259–1455 of the HCV nonstructural protein 3 (NS3) were obtained from the Los-Alamos HCV database (http://hcv.lanl.gov) and supplemented with recently published sequences from our laboratory (17). Overall, sequences from 11 individuals infected with HCV genotype G1a and from 81 G1b-infected subjects were obtained from the HCV database. To achieve comparable numbers of G1a and G1b sequences, single bulk-sequencing based sequences from 73 G1a-infected individuals were included (17). The 165 sequences were aligned and analyzed for the presence of unique 10-mer sequences to design a peptide set of 406 unique 10-mer peptides that span the 197 aa long region of NS3 and covered all 10-mer variants that were present in at least 5% of the aligned sequences.
In vitro T cell expansion and ELISPOT assay
PBMC were isolated as previously described (14). For in vitro expansion of HCV-specific cells, PBMC were resuspended in R10 medium (RPMI 1640 supplemented with 10% heat-inactivated FCS (Sigma-Aldrich), 2 mM l-glutamine, 50 U/ml penicillin, 50 μg/ml streptomycin, and 10 mM HEPES (Mediatech)) and plated in four wells of a 24-well plate at 2 × 106 cells/well. Four additional aliquots of 2 × 106 PBMC were pulsed with each one of four peptide pools (containing ∼100 each of the 406 10-mer total) at a final peptide concentration of 10 μg/ml per peptide. After washing the pulsed cells once in HBSS, they were added to the plated cells. Cells were fed with IL-2 (50 IU/ml) on day 3 and then twice a week thereafter. On day 13, expanded cells were washed three times, incubated overnight in R10 medium without IL-2. and used in IFN-γ-ELISPOT assays the next day as described previously (14). Each pool-stimulated culture was tested against the peptides present in the stimulating peptide pool using a peptide matrix approach (18). One well served as positive control (PHA at 1.8 μg/ml) and three to four wells per pool served as negative controls where cells were incubated in medium alone. ELISPOT plates were developed as described elsewhere (18) and spots were counted using an AID ELISPOT Reader Unit (Autoimmun Diagnostika). The threshold for positive responses was at least five spots per well and exceeding the mean plus 3 SDs of negative control wells. Pool responses exceeding the cutoff were then deconvoluted and the individual targeted peptide was reconfirmed in a subsequent ELISPOT assay.
Retroviral expression of HCV epitome sequence and epitope representation
The epitomized immunogen protein sequence was extended by adding a N-terminal methionine residue as translation initiation signal. The amino acid sequence was translated back into codon optimized cDNA (BlueheronBio). EcoRI restriction sites were added at both sides for insertion into the retroviral vector pMSCV/BlaR. Recombinant retroviruses were produced in HEK293 cells cotransfected with pMSCV/BlaR vector and pVSV-G and pGag/Pol plasmids (BD Clontech). Retrovirus was used to transduce APC, including EBV-transformed B cell lines (B-LCL), T1 cell line, and stably transduced, single HLA allele-expressing 221 cells. Blasticidin S (5 μg/ml) was added to the medium to select for transduced B-LCL. To test for processing and representation of specific epitopes included in the expressed epitome, specific T cell lines were generated against peptides N101 (HPNIEEVALS, B07 restricted), N115 (IPFYGKAIPL, HLA-B5101 restricted), N148a (KLVALGVNAV, HLA-A02 restricted), and N178 (ATDALMTGYT, HLA-A01 restricted) and stimulated (3 × 105) using either stably epitome-expressing APC (1 × 105) or APC transduced with retrovirus expressing the empty vector, with or without additional pulsing with the respective peptide (10 μg/ml). Anti-CD28 and anti-CD49d mAb (1 μg/ml each; BD Biosciences) and brefeldin A (10 μg/ml) were added as previously described (19). Flow cytometric analysis of intracellular IFN-γ production was performed on a LSR-II cytometer using FACSDiva software (BD Biosciences).
Statistical analysis
Statistical analysis was performed using GraphPad Prism version 4.0. All data were analyzed using a nonparametric Mann-Whitney U test or a two-sided Spearman rank test. Values of p < 0.05 were considered significant.
Results
Identification of 10-mer sequences providing extensive diversity coverage of NS3 positions 1259–1455
Vaccine candidates able to protect against heterologous viral challenge will likely need to either include extensive viral sequence diversity or be able to induce immune responses with effective cross-reactive potential. Although linear coverage of even a small portion of the circulating viral diversity will lead to possibly unmanageable long immunogen sequences, the identification of epitope sequences associated with broadly cross-reactive response patterns to other epitope variants may help overcome this hurdle. In the present study, we focused on a region of the HCV genome with moderate sequence diversity and selected a span of 197 aa within NS3 to assess T cell reactivity to a broad set of sequence variants. This region was also selected because it contained a number of well-described CTL epitopes that served as internal controls (20). Based on viral sequence data available at the Los Alamos National Laboratory HCV database (http://hcv.lanl.gov) and 74 recently published NS3 sequences from our laboratory (17), a total of 165 NS3 sequences (84 and 81 sequences for the 1a and 1b genotypes, respectively) were aligned. To generate a sensitive peptide test set that could detect responses to most of the sequence variants present in this alignment, the frequency of all unique 10-mer peptides in the 165 sequences was determined. Of 2050 unique 10-mer sequences identified, 406 were found in at least 5% of the aligned sequences and were synthesized as a peptide set of 10-mer overlapping by 9 aa and spanning the entire 197-residue long stretch of NS3. Starting at position NS3–1259, the overlapping 10-mer peptides were named from N1, N2 to N188, with N1a, 1b, etc., referring to sequence variants of the respective 10-mer. This peptide set also covered all CTL epitopes that had previously been identified in this region and which are restricted by multiple HLA class I alleles (20).
Increased NS3 response rates due to coverage of sequence diversity
To test for cellular immune responses to NS3 (1–197) and to identify dominant and potentially cross-reactive T cell responses, 40 HCV-infected individuals were tested. These included individuals with HCV genotype 1a infection (n =16 subjects), genotype 1b infection (n = 11) as well as subjects infected with genotypes 2b (n = 2), 3a (n = 4), or with undefined genotypes (n = 7). Thirty-four patients were coinfected with HIV and 75% of the 40 patients (n = 30) were chronically infected, whereas 25% (n = 10) were HCV clearers (Table I). CTL responses to the 406 overlapping peptides (OLP) were assessed using in vitro-stimulated PBMC that were expanded for 2 wk before use. Overall, a total of 174 peptide-specific T cell responses were detected (Fig. 1,A). Of these, 127 responses were directed against individual 10-mer regions (N1, etc.), with an additional 47 responses reacting with variants of a recognized 10-mer peptide (N1b, N1c. etc.; Fig. 1 B). To assess whether the observed responses were due to the de novo induction of responses or cross-reactivity with non-HCV sequences, six HCV-negative individuals were tested using identical protocols. Among these subjects, only two weak responses to two different peptides were noticed, suggesting that in vitro expansion of de novo responses or cross-reactivity with non-HCV-derived Ags had a minimal impact on the response pattern seen in the 40 HCV-infected subjects.
As expected, covering sequence diversity in the test set allowed to detect more responses than would have been seen using either a standard reference strain sequence alone (H77, relative increase of 32%) or a genotype 1a/1b consensus sequences (22% more responses), indicating that 20–25% of all responses to this only moderately variable region of HCV would have been missed by conventional test reagents. Furthermore, 65 responses (37.4%) were identified that showed cross-reactivity with at least one other variant of the targeted 10-mer peptide. These 65 responses were distributed over 35 individual 10-mer regions/positions of the 197-aa long stretch analyzed, indicating that roughly one-third of the detected responses showed cross-reactivity with at least one additional variant of the targeted 10-mer. These considerations are important for the design of immunogen sequences of minimal length, as for each 10-mer, its most cross-reactive variant could substitute for the reference-sequence based or any other 10-mer sequence, irrespective of the frequency of its representation in the circulating viral sequences. The importance to cover for sequence diversity is also highlighted by the fact that peptides covering 10-mer with higher sequence heterogeneity were more frequently targeted than 10-mer spanning highly conserved regions (Fig. 1 C; p = 0.001, two-sided Spearman rank test), suggesting that responses to more variable regions may frequently be underestimated (14, 21).
Immunodominant regions in NS3 and identification of novel HLA-B*3501-restricted CTL targets
The analysis of all reacting 10-mer peptides revealed several regions that were frequently targeted (Table II). Their inclusion in a condensed vaccine immunogen sequence could be of particular benefit, especially if their HLA restriction would be known and the frequency of the restricting allele(s) in the target population would be reasonably high. Some of these dominant regions matched sites of previously known CTL epitopes, including 10-mer peptides covering positions N100–N101, N113–N115, N145–N148, N177–N179 and corresponding to CTL epitopes commonly referred to as NS3–1359, NS3–1371, NS3–1395, and NS3–1395, respectively (17, 20). However, for a number of additional frequently targeted OLP, including 10-mers peptides N9b, N11, N14c, N15, N35, N62, and N118, no epitope had been described in the past. For five of these dominant regions that were targeted by more then three subjects (N9b, N14c, N35, N62, and N118), a HLA allele frequency analysis (22) revealed a statistically significant overrepresentation of individuals expressing HLA-B*3501 among the 10-mer responders compared with the rest of the cohort (Table III). Although not completely in line with previously published HLA-B*3501-binding motifs (http://www.syfpeithi.de), standard HLA restriction analyses using either the consensus sequence or the most frequently targeted sequence variant showed these responses to be HLA-B*3501-restricted (Fig. 2), as APC matched in the expression of HLA-B*3501 were recognized by specific T cell lines when pulsed with the corresponding peptide. In contrast, control APC that did not express HLA-B*3501 did not elicit a response (data not shown). Because HLA-B*3501 is a relatively common allele across different ethnicities (23), their inclusion in the epitome design would provide substantial host population coverage. In addition, since the consensus sequence of at least three of these epitopes were frequently targeted, the immunogenic epitope variant would likely be present in the circulating viral population, at least in genotype 1a and 1b endemic regions, and thus provide a potentially valuable component of a vaccine immunogen. In contrast, for 10 of the targeted peptides (Table II), variant sequences different from the consensus sequence were more frequently targeted than the consensus and could represent interesting immunogen candidates with superior immunogenicity compared with consensus sequences.
Peptide . | . | . | Frequency of Recognitiona . | . | B35 Enrichment (p) . | |||
---|---|---|---|---|---|---|---|---|
OLP . | Name . | Sequence . | B35+ . | B35− . | . | |||
24 | N9b | YMSKAYGTDP | 75.0% | 3.1% | 0.000048 | |||
45 | N14c | YGTDPNIRTG | 87.5% | 3.1% | < 0.00001 | |||
82 | N35 | YSTYGKFLAD | 87.5% | 3.1% | < 0.00001 | |||
129 | N62 | STADTSILGI | 62.5% | 6.3% | 0.00153 | |||
232 | N118 | YGKAIPLEVI | 62.5% | 0.0% | 0.000085 |
Peptide . | . | . | Frequency of Recognitiona . | . | B35 Enrichment (p) . | |||
---|---|---|---|---|---|---|---|---|
OLP . | Name . | Sequence . | B35+ . | B35− . | . | |||
24 | N9b | YMSKAYGTDP | 75.0% | 3.1% | 0.000048 | |||
45 | N14c | YGTDPNIRTG | 87.5% | 3.1% | < 0.00001 | |||
82 | N35 | YSTYGKFLAD | 87.5% | 3.1% | < 0.00001 | |||
129 | N62 | STADTSILGI | 62.5% | 6.3% | 0.00153 | |||
232 | N118 | YGKAIPLEVI | 62.5% | 0.0% | 0.000085 |
The frequency of HLA-B*35 expression in the tested cohort was 20%.
Condensation of epitope information into epitomes of variable length
Vaccine delivery, either as recombinant protein or vector expressed, will be greatly facilitated if the immunogen length can be kept to a minimum. To build a NS3 (1259–1455)-specific immunogen sequence that would contain as much immunogenic content as possible while still minimizing immunogen length, a recently described epitome approach was applied. The epitome concept combines the analysis of viral sequence diversity, HLA allele motif information, T cell cross-reactivity, and compression of immunogenic information into shorter immunogen sequences (15, 24). In this study, we use a simple algorithm (http://atom.research.microsoft.com/bio and http://research.microsoft.com/users/jojic/HIVepitome.html), following a stepwise inclusion/exclusion approach, starting with all of the 10-mer sequences that were targeted at least once (n = 90) in the cohort of 40 subjects (Table II). First, and certainly influenced by the HLA allele class I frequencies in the relatively small cohort tested, targeted 10-mer peptide were sorted based on their frequency of recognition and selected for inclusion in the epitome. In cases where several variants of the same 10-mer region were present in the test set, the 10-mer variant that was most consistently targeted among the subjects showing a response to the same 10-mer region was selected (inclusion criteria A; Table II). If two corresponding variants were targeted at equal frequency, the one variant eliciting the stronger median response was chosen (criteria B; Table II). Furthermore, the frequency of the sequence variant(s) in the HCV database was used as an additional criterion for responses of comparable magnitude (defined as within 2-fold the spot-forming cells per 106 input cells) (criteria C; Table II). Finally, as responses to adjacent OLP may reflect reactivity to the 9-mer sequence shared between the two 10-mer peptides, the 10-mer OLP that elicited the greater magnitude was selected for inclusion (criteria D; Table II). Based on this algorithm, the 90 targeted OLP were reduced to 52 individual peptide sequences, forming the immunogenic reactivities for building a compressed epitome sequence. For compression, all 52 peptide sequences were pairwise analyzed for sequence overlaps and, if present, were assembled in a linear sequence eliminating duplicated sequence information as far as possible. Overlapping 10-mer peptides were added to the epitome using a hill-climbing algorithm wherein, at each step in the algorithm, the 10-mer leading to the greatest increase in coverage of responses per length was added (Fig. 3,A) (24). This compression of epitope sequences resulted in a construct of 336 residues, a considerably shorter immunogen length compared with a string of frequently targeted 10-mer peptides, which in this case would have been 520 aa (i.e., 60% longer than the epitome). This immunogen length can be further compressed by in silico simulations by applying different, pre-set levels of coverage of the observed in vitro responses (i.e., how many of the 52-targeted 10-mer peptides one wishes to be included in the epitome). For short immunogen sequences, the response coverage increases rapidly when adding one or few additional frequently targeted 10-mer peptides. In contrast, addition of further epitopes to already extended epitome sequences yields relatively diminishing benefits because overproportionally many rarely targeted 10-mer peptides are being included (Fig. 3,A and data not shown). For the subsequent in vitro analysis of an expressed epitome, we selected an epitome length that mediated coverage of 75% of the observed responses (Fig. 3 B). Beyond this coverage, the benefit of covering more responses may start to be outweighed by the extensive immunogen length required to cover these additional responses. This selection thus resulted in an immunogen length of 147 aa, which is considerably shorter than the originally investigated region of NS3, while still containing all of the dominant epitopes targeted in the tested cohort and providing extensive epitope cross-reactivity.
Re-presentation of epitopes embedded in the expressed epitome
The E75% (epitome covering 75% of all detected responses) was retrovirally expressed in vitro and tested for the re-presentation of embedded epitopes in the context of their restricting HLA class I molecules. A human codon-optimized DNA sequence was designed and stably expressed in B-LCL, single HLA allele-transduced 221 cells, or the T1 cell line. As effector cells, T cell lines were generated against peptides N101 (HPNIEEVALS), N115 (IPFYGKAIPL, HLA-B*5101 restricted), N148a (KLVAL GVNAV, HLA-A*0201 restricted), and N178 (ATDALMTGYT, HLA-A*0101 restricted), respectively; all of which were contained in the expressed epitome (Fig. 3,B). After a 6-h coincubation of APC and the epitope-specific T cell line, intracellular staining for the detection of IFN-γ indicated that three of the four tested peptides were efficiently processed and presented (Fig. 4, A–C). Reponses were significantly higher than activity to cells expressing the empty vector (mock transduction) and comparable to the response elicited by peptide-pulsed APC, suggesting a highly efficient processing and presentation of the epitope from the epitome protein. In contrast, considerably less specific IFN-γ production was observed when testing for the re-presentation of the HLA-A*0201-restricted epitope KLVALGVNAV (Fig. 4 D), despite consistently strong responses against peptide-pulsed epitome-expressing B-LCL. Since the N148a epitope is embedded in the epitome sequence by flanking residues that do not occur in the natural context of the viral NS3 protein sequence, its processing may be at least partly impaired. To test this, the processing and presentation of N148a when expressed in the epitome sequence was compared with the recognition when contained in its natural context. Indeed, the stimulation of N148a-specific T cells by T1 cells transduced with the natural sequence of the NS3/4a protein demonstrate that N148a is efficiently processed from the natural flanking residues but not the epitome sequence, suggesting that flanking region in the epitome were less supportive for its effective representation. Although these data indicate that the representation of some epitopes contained in an artificial epitome sequence may be impaired, they demonstrate that the E75% epitome was successfully expressed in the transduced APC and that the majority of the tested embedded epitopes was efficiently processed and presented in the context of the restricting HLA class I allele.
Discussion
The present data describe the rational inclusion of broadly cross-reactive T cell epitopes with extensive sequence diversity coverage into a compressed epitome sequence from which individual epitopes can be processed and presented in the context of restricting HLA class I molecules. The data establish the potential usefulness of the epitome approach, designed to build minimal length immunogen sequences by optimizing sequence overlap and provide sequence diversity by taking advantage of T cell cross-reactivity information. Although such immunogen design could help overcome issues of global or regional sequence diversity without introducing redundant sequence information and unnecessarily extending immunogen length, future applications will still profit from further refinements of the described algorithm. For instance, the selection of widely cross-reactive sequence variants will ultimately need to reflect a balance between currently circulating viral variants and the frequency at which subdominant sequences can cross-react with the most common variants. This will not be answered by simply testing recall responses in natural infections for their cross-reactive potential since these responses may have likely been induced by the most dominant variants in the circulating viral population. Instead, it will possibly require carefully assessing the cross-reactivity potential by responses induced by subdominant variants. Although studies available to date have largely been limited to the induction of variant-specific responses in animal models, recent analyses in HIV have also documented the superior immunogenicity of certain subdominant sequence variants (12, 25, 26, 27). Importantly, the inclusion of such variants in vaccine immunogen sequences may not only provide potent immunogen components but may also facilitate the screening and reliable detection of responses in cohorts with particularly high viral diversity in the circulating viral population. Evidently, further analyses will be needed in such studies to investigate whether the observed responses were induced in vivo by minor sequence variants or whether they represent true cross-reactivities. Given the oftentimes uncertain or unattainable knowledge of infecting viral sequence and the potential occurrence of only transiently present sequence variants that may have induced responses, this can be a formidable challenge, especially when also considering that observed responses may be fully cross-reactivity T cell populations or the sum of distinct T cell populations in one individual that react with different individual variants only (28).
To build maximally effective immunogen sequences, cross-reactivity patterns will need to be investigated in detail, not only to inform epitome-based immunogens, but also when designing other single-sequence based vaccine sequences. This will be especially crucial when covering regions of the virus with elevated sequence entropy (14, 21, 29). In fact, the data presented here indicate that response rate to more variable 10-mer peptides were more frequent than responses to more conserved regions. This observation suggests that sequence diversity on a population level may be driven by T cell-mediated immune pressure, an observation with considerable implications for vaccine design (17). These findings are, on first sight, in contradiction to earlier reports that have described an inverse association between frequency of OLP recognition and sequence entropy (14, 21). However, it is important to note that such past studies have used monomorphic peptide test sets (i.e., consensus sequences) that did not allow for broad detection of variant-specific reactivities. Only by including epitope variants in the test system, ideally to even a higher coverage level as attained here, will it be possible to reliably determine the true breadth and distribution patterns of T cell responses against highly variable pathogens such as HIV and HCV (13, 30).
Although processing impairments could considerably limit the usefulness of the epitomized immunogens, it is important to realize that such negative effects could hamper the success of any, even noncompressed, linear epitope-string immunogen sequences. In the four investigated examples of embedded epitopes, all were flanked by non-naturally occurring residues, yet only epitope N148a showed some negative impact in its processability. This is in line with a recent study by Le Gall et al. (31) that showed that moving epitope sequences in a non-natural context can drastically affect their processing. Moreover, the apparent negative impact of these flanking sequences on epitope N148a is supported by an analysis for potential proteasomal cut sites (http://www.mpiib-berlin.mpg.de/MAPPP/index.html), which did not yield any productive processing sites for this epitope (data not shown). Based on known processing motifs, it is likely that the glycine residue located two residues before the N-terminal end of epitope N148a in the epitome sequence may have negatively impacted the processing and presentation of N148a (31, 32). Of course, these analyses do not rule out that the presence of numerous additional epitopes contained in the epitome could interfere with the efficient presentation of the epitope. However, it has also been shown that many CD8+ T cell epitopes can be overlapping and still efficiently presented either from a same precursor protein (33, 34) or even from a multiepitope vaccine candidate (35). In addition, the epitome contains fewer epitopes than may be present in an infected cell and since the response to N148a is induced in natural infection, interepitope competition for presentation may be less likely to be responsible for the observed lack of representation. To overcome at least some of these processing issues, one might also envision to include insertions of natural flanking sequences or specific spacer residues, allowing for more optimal proteasomal cleavage (reviewed in Ref. 36). Evidently, the insertion of such spacer residues could be limited to sites where epitopes are no longer flanked by their natural context and/or where processing algorithm would predict ineffective processing. Alternatively, one could also build the epitome sequence using extended epitope sequences that contain their natural flanking sequences, for example, using 15-mer sequences flanking centrally placed around targeted 9-mer epitope sequences with three residues on either side. Although this may ensure more efficient processing, it would however extend immunogen length again, create novel epitope sequences, and introduce potentially unnecessary sequence information in the vaccine sequence. Regardless, expanded immunogen length as a result of using flanked epitope sequences can be countered again by applying the same algorithms as used for designing the above E75% epitome. In fact, when the same epitopes that were selected for the E75% epitome described above were used as 16-mer (10-mer peptide plus 3 naturally flanking residues on either side), the resulting epitome still provided a 52% reduction in length compared with the string of fifty-two 16-mers (data not shown). In addition, the analysis of sequence diversity and variant cross-reactivity as presented here would allow us to specifically select broadly cross-reactive epitope variants that do not create processing interference for other epitopes in the epitome overlap. For instance, in the case of the less efficiently processed HLA-A*0201 epitope, the glycine residue is located at position 9 of the preceding 10-mer peptide and unlikely to be an anchor residue for the targeted epitope (http://www.hiv.lanl.gov/content/hiv-db/PEPTGEN). Changing the glycine to more processing-friendly residues that do not interfere with the recognition of the 10-mer peptide could possibly restore the processing and presentation of the HLA-A*0201 epitope without losing immunogenic content and ability to cover sequence diversity, either by inclusion of most common variants or by using cross-reactivity patterns.
Together, as broadly effective vaccines for highly variable viruses will either need to induce cross-reactive responses, contain extensive sequence variability, or both, vaccine delivery and in vivo immunogenicity are crucial factors in inducing the qualitatively and quantitatively “right” response. Packing this immunogenic information into either recombinant proteins or into vaccine expression vectors will equally profit from approaches that can compress the relevant information into shortened immunogen sequences. The epitome approach, combined with an effective processing algorithm, HLA motif scans, and factoring in host population HLA allele frequency and locally circulating viral diversity, likely provides such a strategy and could provide important support for future vaccine design.
Disclosures
David Heckermann, Nebojsa Jojic, and Carl Kadie are employees of Microsoft Research.
Footnotes
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
This work was supported by grants from the Swiss National Science Foundation (Grant 3200B0-103874) and the European Community (European Community Network VIRGIL).
Abbreviations used in this paper: HCV, hepatitis C virus; B-LCL, EBV-transformed B cell line; NS3, nonstructural protein; OLP, overlapping peptide.