Abstract
Identification of the specific HLA locus and allele presenting an epitope for recognition by specific TCRs (HLA restriction) is necessary to fully characterize the immune response to Ags. Experimental determination of HLA restriction is complex and technically challenging. As an alternative, the restricting HLA locus and allele can be inferred by genetic association, using response data in an HLA-typed population. However, simple odds ratio (OR) calculations can be problematic when dealing with large numbers of subjects and Ags, and because the same epitope can be presented by multiple alleles (epitope promiscuity). In this study, we develop a tool, denominated Restrictor Analysis Tool for Epitopes, to extract inferred restriction from HLA class II–typed epitope responses. This automated method infers HLA class II restriction from large datasets of T cell responses in HLA class II–typed subjects by calculating ORs and relative frequencies from simple data tables. The program is validated by: 1) analyzing data of previously determined HLA restrictions; 2) experimentally determining in selected individuals new HLA restrictions using HLA-transfected cell lines; and 3) predicting HLA restriction of particular peptides and showing that corresponding HLA class II tetramers efficiently bind to epitope-specific T cells. We further design a specific iterative algorithm to account for promiscuous recognition by calculation of OR values for combinations of different HLA molecules while incorporating predicted HLA binding affinity. The Restrictor Analysis Tool for Epitopes program streamlines the prediction of HLA class II restriction across multiple T cell epitopes and HLA types.
Introduction
Determination of the HLA restriction of human T cell responses is becoming increasingly necessary, as new approaches to immunophenotyping such as CyTOF, Fluidigm, and RNA profiling rely on tetramer staining as a way to gate or isolate Ag-specific T cells (1–3). In molecular terms, HLA restriction reflects the formation of a trimolecular complex, encompassing the Ag-specific TCR and a bimolecular complex formed by a specific epitope and a specific HLA molecule. TCR binding occurs when the epitope–HLA complex “fits” the Ag-binding site of the specific TCR. Hence recognition of that epitope by the TCR is “restricted” by that particular HLA type. In cellular terms, a T cell response to a given epitope being restricted by a given HLA refers to the fact that T cell recognition will only occur when that epitope is presented by an APC or a target expressing the specific HLA molecule that was involved in the original priming and elicitation of the T cell response.
The production of tetramer staining reagents relies on the exact identification not only of the epitope recognized by the specific T cells, but also the specific HLA molecule(s) that bind and present the specific epitope to T cell scrutiny (4). Because HLA molecules are polygenic (encoded by multiple loci) and polymorphic (each gene can be encoded by different allelic variants), this task is not trivial, also due to the extreme HLA diversity present in human populations (5). HLA class II molecules are heterodimers that consist of α (less polymorphic) and β (highly polymorphic) chains, and HLA class I molecules are encoded at multiple loci (A, B, and C as the main ones). As of January 2015, the three major HLA class I α-chain loci are composed of 9308 alleles (HLA-A [2995], HLA-B [3760], and HLA-C [2553]), which all bind the invariant β2-microglobulin chain. The class II loci consist of an α (less polymorphic) and a β (highly polymorphic) chain, and the four major class II loci include 97 α and 2963 β alleles (HLA-DPA [38 alleles] and -DPB [489], HLA-DQA [52] and -DQB [734], HLA-DRA [7], B1 and B3/4/5 [1740]) (5).
The exact HLA type of human subjects in a population under study can be readily determined by a variety of methods, increasingly relying on second-generation sequencing methods (1–3). The gene and allelic variant restricting specific T cell responses is determined by additional experimentation relying on classical immunobiological approaches, such as inhibition by HLA locus–specific Abs, and use of matched/mismatched or single HLA molecule transfected cell lines (6).
An alternative is to use classic genetic tools based on calculations of odds ratios (ORs) (7). The OR method has been used extensively to estimate the likelihood that a certain genetic trait is associated with a certain biological condition or outcome (8–10). The method is based on comparing the frequency with which a certain outcome is observed in individuals carrying a given gene or allelic variant to the frequency of outcome in individuals not expressing the given gene or allelic variant. For example, ORs have been extensively used to pinpoint and quantify the relative contribution of various genes to autoimmune diseases (11, 12).
The OR method can be used to determine likely restrictions, by considering the presence or absence of a response to a given epitope as the biological outcome, and calculating the OR and associated statistical significance for each HLA molecule expressed in human subjects in which a response, or lack thereof, is observed. The method is simple and powerful. However, by definition, statistical significance is reached only when a suitably large number of subjects are assayed. As a result, calculations of OR and statistical significance may become cumbersome, especially when a relatively large number of epitopes is simultaneously analyzed in a cohort represented by a large number of allelic polymorphisms.
Additional complexity may arise because of “HLA linkage” and/or “epitope promiscuity.” Certain HLA class I and II alleles are sometimes in very strong linkage disequilibrium; thus, a positive OR can be obtained for an HLA molecule that is not the real restricting element of the epitope but is in linkage disequilibrium with the real restricting allele. Further, an epitope can be restricted by multiple HLA molecules (a phenomenon called epitope promiscuity). Therefore, obtaining a positive OR may be problematic, because restriction by multiple alleles may proportionally increase the noise (“rider” alleles) without increasing the signal, thereby masking relevant restricting alleles. This can further complicate identification of restricting allele for an epitope.
In this article, we report the development and initial validation of the Restrictor Analysis Tool for Epitopes (RATE), a computational application that allows the user to obtain reports describing HLA class II restrictions inferred from response patterns in an HLA-typed human population based on a standardized process flow and statistical evaluation.
Materials and Methods
Programming
The RATE tool is a Python 2.6.5+ CGI script. The web interface is implemented using HTML and python-CGI with the statistical analysis done using RPy (http://rpy.sourceforge.net).
Statistical analysis
ORs assess the strength of association of one property to another in a sample population (7). In the current application, ORs are used to quantify the strength of associations between expression of a specific allele and detection of positive immune response. An OR greater than 1 indicates a positive association between the two properties in question (i.e., expressing the specific allele increases the “odds” of having positive immune response). ORs are calculated according to the following formula:
where A+ = number of donors expressing a specific allele, A− = number of donors not expressing the specific allele, R+ = number of donors with a positive immune response to the specific peptide, and R− = number of donors who do not have a positive immune response to the specific peptide. Thus, for example, A+R+ indicates number of donors expressing the specific allele and having a positive response against a specific peptide.
The OR becomes infinity when none of the donors who do not express the allele have a positive response, that is, A−R+ = 0. Although this cannot be avoided, a relative frequency (RF) can be used to estimate the enrichment of responders expressing a given allele relative to the whole population. Because this value will never be zero, the RF measure will never be “infinity” due to division by zero, even in instances where the OR measure is “infinity” due to division by zero. Accordingly, we also calculate RF, which is expressed as the ratio of the response in donors expressing the specific allele to the response in all donors, and it is calculated as follows:
Fisher’s exact test is applied here to calculate the statistical significance of the difference in immune response between the donors who express a specific allele and those who do not, thus highlighting the restricting allele for each peptide. A p value < 0.05 was considered statistically significant.
No adjustments are made on the p value to correct for multiple statistical tests. Accordingly, the p values cannot be taken as actual probabilities of a given restriction to be true, but rather serve as a relative ranking that restriction has most statistical support. The purpose of these rankings is to guide further experiments that are necessary to fully confirm restrictions and that allow the experimenter to focus on prioritized candidate HLA alleles.
Iterative algorithm for detection of promiscuous binding alleles
A specific algorithm was designed to address epitope promiscuity. The algorithm first identifies the HLA alleles expressed in each of the donors who gave a positive response to an epitope. The binding affinity of the epitope for each of the alleles identified is then predicted using the Immune Epitope Database (IEDB) MHC binding prediction tools using RESTful web services (http://tools.immuneepitope.org/main/html/tools_api.html) (13). The binding prediction is done using the consensus method, which uses a combination of NN-align, SMM-align, and CombLib/Sturniolo. If the specified allele is not available under the consensus method, the NetMHCIIpan method is chosen by default. More details on prediction methods are available in Paul et al. (14). All alleles predicted to bind with the peptide (IEDB consensus percentile ≤ 15.0) are selected for further screening. For general binding predictions for individual alleles, the recommended threshold for considering a peptide to be binder is IEDB consensus percentile 10.0, whereas for predicting promiscuous binding, the recommended threshold is 20.0 (14, 15). The cutoff used in this study (15.0) was chosen as a midway between these two thresholds and based on the reasoning that too stringent of a cutoff might be overlooking potential promiscuous restrictions.
After the algorithm calculates the OR, RF, and p value for each individual allele, it then combines response data for all possible allele pairs and evaluates whether an improved p value is obtained. Subsequent iterations combine various allele groups, and p values are calculated with each iteration. Iteration cycles continue until the p value cannot be improved further. The algorithm reports the allele combinations associated with the best p value for each epitope. For example, consider alleles A, B, C, and D with the values listed in Table I for an epitope.
Allele . | A+R+ . | A−R+ . | A+R− . | A−R− . | RF . | OR . | p . |
---|---|---|---|---|---|---|---|
A | 8 | 3 | 27 | 45 | 1.7 | 4.4 | 0.046 |
B | 4 | 7 | 10 | 62 | 2.2 | 3.5 | 0.084 |
C | 3 | 8 | 8 | 64 | 2.1 | 2.9 | 0.157 |
D | 4 | 7 | 9 | 63 | 2.3 | 3.9 | 0.065 |
B + C + D | 10 | 1 | 20 | 52 | 2.5 | 17.4 | <0.001 |
Allele . | A+R+ . | A−R+ . | A+R− . | A−R− . | RF . | OR . | p . |
---|---|---|---|---|---|---|---|
A | 8 | 3 | 27 | 45 | 1.7 | 4.4 | 0.046 |
B | 4 | 7 | 10 | 62 | 2.2 | 3.5 | 0.084 |
C | 3 | 8 | 8 | 64 | 2.1 | 2.9 | 0.157 |
D | 4 | 7 | 9 | 63 | 2.3 | 3.9 | 0.065 |
B + C + D | 10 | 1 | 20 | 52 | 2.5 | 17.4 | <0.001 |
Although only allele A is associated with p < 0.05 and no other allele showed significant p value, iterative analysis by combining the data for the other alleles may reveal that the combination of the three alleles (B, C, and D) gives a more significant p value. The algorithm reports this combination of alleles as potential promiscuous restricting alleles.
Availability
The RATE tool is available online at http://iedb-rate.liai.org.
Immune response datasets
Immune response datasets were generated as described previously (16) and by D.M. McKinney, C.S. Lindestam Arlehamn, V. Rozot, E. Makgotlho, W. Hanekom, T.J. Scriba, and A. Sette (manuscript in preparation). In brief, immune responses to various Mycobacterium tuberculosis epitopes in PBMCs from individuals with latent M. tuberculosis infection (LTBI) were measured by IFN-γ–specific ELISPOT as representative of Th1 responses and reported as spot-forming cells (SFCs) per million cells.
In addition, CD4+ T cell immune responses to 15-mer peptides from the acellular Bordetella pertussis vaccine (M.B.C. Dillon, T.A. Bancroft, R. Kolla, S. Paul, J. Sidney, B. Peters, and A. Sette, manuscript in preparation) were measured as previously described (17, 18). In brief, PBMCs isolated from whole blood were stimulated with isolated B. pertussis vaccine proteins for 14 d, with fresh human rIL-2 added every 3 d. Subsequently, the cells were restimulated with peptides and lymphokine production was measured with a dual IFN-γ and IL-5 ELISPOT assay, representative of Th1 and Th2 CD4+ subsets, respectively (18).
The criteria for positivity for the ELISPOT assay we used were as follows: responses were considered positive if the stimulus had >20 SFC/106 PBMCs, p < 0.05 by Student t test, and a stimulation index >2.0. These criteria are the ones we have consistently used for >10 y and have been maintained for consistency’s sake in this study as well (15, 17, 19–22).
HLA typing
HLA class II typing was performed using next-generation sequencing methods (D.M. McKinney, Z. Fu, L. Le, J.A. Greenbaum, B. Peters, and A. Sette, submitted for publication). Specifically, amplicons were generated from the appropriate class II locus for exons 2–4 by PCR amplification. Sequencing libraries were generated (Illumina Nextera XT) from these amplicons and sequenced with MiSeq Reagent Kit v3 as per manufacturer’s instructions (Illumina, San Diego, CA). Sequence reads were matched to HLA alleles and donor genotyping was assigned (D.M. McKinney, Z. Fu, L. Le, J.A. Greenbaum, B. Peters, and A. Sette, submitted for publication).
Tetramer staining experiments
B. pertussis biotinylated HLA class II tetramers conjugated to streptavidin-PE were provided by the Tetramer Core Laboratory at Benaroya Research Institute. For B. pertussis ex vivo tetramer staining experiments, CD4+ T cells were isolated from cryopreserved PBMCs using CD4+ T cell Isolation Kit (Miltenyi) according to manufacturer’s instructions. For B. pertussis in vitro tetramer staining, isolated PBMCs were stimulated as described earlier with the tetramer-specific peptide for 14 d and subsequently harvested for analysis. Purified CD4+ cells or expanded cells were incubated with a 1:50 dilution of tetramer-PE for 2 h at room temperature and stained for 30 min at room temperature in FACS buffer (PBS with 2% FBS) with Abs to the following surface markers: CD3-AF700 (BD Biosciences), CD4-allophycocyanin-ef780 (eBioscience), CD8-V500 (BD Biosciences), CD45RA-ef450 (eBioscience), and CCR7-PerCP-Cy5.5 (Biolegend). After washing, cells were resuspended in PBS and read on a BD LSRII and analyzed with FlowJo. M. tuberculosis biotinylated HLA class II tetramers were generously provided by the National Institutes of Health tetramer core facility. For M. tuberculosis ex vivo tetramer staining, PBMCs were stained with tetramer-PE, CD4-FITC, CD8a-PECy5, CD19-PECy5, CD11b-PECy5, CD56-PECy5, and Live-Dead Aqua (Invitrogen). Tetramer-stained cells were enriched with anti-PE magnetic beads (Miltenyi) and analyzed.
Experimental restriction determination by single HLA-transfected cells
To verify whether HLA/epitope predicted restrictions were correct, we performed Ag presentation assays with single HLA transfectants using cell lines as described previously (6). In brief, PBMCs isolated from whole blood were incubated ex vivo with peptide-pulsed EBV-transformed cell lines expressing selected HLA molecules in an IFN-γ ELISPOT assay, as described earlier. The cell lines used for transfection were DAP.3 (for DRB1*03:01, DRB1*04:01, DRB1*07:01, DRB1*11:01, DRB1*13:01, DRB1*15:01, DRB4*01:01, and DRB5*01:01) and RM3 (for DRB3*02:02, DPA1*02:01/DPB1*01:01, DPA1*01:01/DPB1*04:01, DQA1*05:01/B1*02:01, and DQA1*01:02/DQB1*06:02) (6). All DR lines used DRA1*01:01 as the α-chain. Allele restriction was determined by comparing responses of peptide-pulsed cell lines with media-only–pulsed cell lines. The level of statistical significance was determined with a Student t test using the mean of triplicate values of the response against peptide-pulsed cell lines versus the response against the media-pulsed cell line control.
Results
Standardized import format generation for subjects and epitope-donor response
To afford analysis of congruent datasets in a standardized fashion, we designed RATE to allow importing test results from a number of different epitopes in a given biological assay, and for responses in a cohort of donors/test subjects. A tab delimited plain text file format was chosen because it is typically available as a data export option from most instrumentation, or can easily be generated from commonly used graphing and spreadsheet software. Supplemental Table I shows sample response data in a spreadsheet.
The main focus of our efforts was to design and validate a method to facilitate determination of restrictions for HLA class II molecules, and accordingly we mostly used data obtained in our laboratory where HLA class II responses were measured in a number of different settings, using 15-mer peptides and ELISPOT or intracellular cytokine staining assays, in a sufficient number of donors. To measure immune responses, we routinely use ELISPOT. However, data from any assay may be used, as long as operative criteria for positivity can be defined. Although the two statistical measures used (OR and RF, as described later) depend on a binary outcome (positive or negative), the absolute values determined by the assay can be entered directly, and a threshold for positivity can be chosen before calculation of the metrics. Positivity thresholds for ELISPOT measures have been described previously by our group (6, 16). The tool does not require that all epitopes be tested in all donors.
Similar to the response data, RATE uploads HLA typing data provided as a tab delimited plain text file (Supplemental Table II). Although different typing methods determine HLA type to differing levels of resolution (e.g., allele group, protein) (23), the OR and RF methods are independent from the HLA methods used for typing. Indeed, the MHC type analyzed may be processed using any user-preferred nomenclature system as long as the typing categories are mutually exclusive. In our laboratory, we routinely HLA-type to the protein level (four-digit typing, e.g., HLA-DRB*01:01), but serological (HLA-DRB1), allele group (HLA-DRB*01), and even typing formats down to the level demarcating synonymous DNA substitutions (HLA-DRB*01:01:01) are also compatible with the approach. Furthermore, complete HLA class II typing is not required, and data for a given locus or even a partial list of alleles can be used, with the caveat that the tool will output the most likely statistical association given the data at hand, whereas a better candidate could have been identified with a more complete dataset.
Two alternative outputs of calculated ORs and RFs
For each epitope–HLA combination, RATE generates a matrix tabulating the number of positive versus negative responders among the subjects expressing that particular HLA type, as well as the number of positive and negative responders among the subjects not expressing that HLA type. Next, for each epitope, RATE calculates the ORs and RFs corresponding to each of the HLA molecules following classical formulas, as described in 2Materials and Methods. The addition of RF allows the user to calculate a numerical value for epitope–HLA pairs for which an OR would be incalculable because of lack of response among individuals not expressing that particular HLA allele. The Fisher’s exact test is used to calculate the significance (p value) associated with each epitope–HLA combination.
For each peptide, the OR, RF, and p values for each of the HLA types expressed in the responder subjects are ranked and tabulated from high to low OR. In this reporting format, the number of responding individuals positive in the assay (R+) and expressing each particular HLA (A+) is also presented (A+R+). The number of similarly defined A+R−, A−R+, and A−R− individuals is also reported. A partial example of this type of report is shown in Table II.
Peptide No. . | Peptide ID . | Peptide Sequence . | Allele No. . | Allele . | A+R+ . | A−R+ . | A+R− . | A−R− . | No. of Donors . | Response n/a . | RF . | OR . | p . |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 3531.0375 | MSQIMYNYPAMMAHA | 1 | DPA1*01:03 | 13 | 9 | 48 | 11 | 81 | 6 | 0.78 | 0.34 | 0.048 |
2 | 3531.0461 | AGCQTYKWETFLTSE | 1 | DPA1*01:03 | 7 | 1 | 54 | 19 | 81 | 6 | 1.16 | 2.44 | 0.672 |
3 | 3531.0511 | GEEYLILSARDVLAV | 1 | DPA1*01:03 | 6 | 0 | 59 | 22 | 87 | 0 | 1.34 | inf | 0.330 |
4 | 3550.0065 | STHEANTMAMMARDT | 1 | DPA1*01:03 | 3 | 1 | 34 | 9 | 47 | 40 | 0.95 | 0.80 | 1.000 |
5 | 3550.0063 | DLVRAYHSMSSTHEA | 1 | DPA1*01:03 | 5 | 0 | 32 | 10 | 47 | 40 | 1.27 | inf | 0.569 |
6 | 3550.0061 | DLVRAYHAMSSTHEA | 1 | DPA1*01:03 | 4 | 0 | 33 | 10 | 47 | 40 | 1.27 | inf | 0.564 |
7 | 3550.0060 | AMEDLVRAYHAMSST | 1 | DPA1*01:03 | 2 | 0 | 35 | 10 | 47 | 40 | 1.27 | inf | 1.000 |
8 | 3550.0059 | IMYNYPTMLGHAGDM | 1 | DPA1*01:03 | 4 | 1 | 33 | 9 | 47 | 40 | 1.02 | 1.09 | 1.000 |
9 | 3550.0058 | MSQIMYNYPTMLGHA | 1 | DPA1*01:03 | 4 | 1 | 33 | 9 | 47 | 40 | 1.02 | 1.09 | 1.000 |
10 | 3550.0057 | IMYNYPAMLGHAGDM | 1 | DPA1*01:03 | 7 | 1 | 30 | 9 | 47 | 40 | 1.11 | 2.07 | 0.667 |
11 | 3550.0056 | MSQIMYNYPAMLGHA | 1 | DPA1*01:03 | 7 | 2 | 30 | 8 | 47 | 40 | 0.99 | 0.93 | 1.000 |
12 | 3550.0055 | TEIRRSNAPRLVDLV | 1 | DPA1*01:03 | 1 | 0 | 36 | 10 | 47 | 40 | 1.27 | inf | 1.000 |
13 | 3550.0052 | GTEIRRSDAPRLVDL | 1 | DPA1*01:03 | 2 | 0 | 35 | 10 | 47 | 40 | 1.27 | inf | 1.000 |
14 | 3550.0051 | SNIKIIRIDEFRRCG | 1 | DPA1*01:03 | 1 | 0 | 36 | 10 | 47 | 40 | 1.27 | inf | 1.000 |
15 | 3550.0046 | HSNIKIIRIDEFRRY | 1 | DPA1*01:03 | 1 | 0 | 36 | 10 | 47 | 40 | 1.27 | inf | 1.000 |
16 | 3550.0028 | PYVIELDGQFCGQLT | 1 | DPA1*01:03 | 1 | 0 | 42 | 14 | 57 | 30 | 1.33 | inf | 1.000 |
17 | 3550.0026 | EWTVRHTVAAWPAVC | 1 | DPA1*01:03 | 1 | 0 | 42 | 14 | 57 | 30 | 1.33 | inf | 1.000 |
18 | 3550.0024 | GTEIRRSNAPRLVDLV | 1 | DPA1*01:03 | 1 | 0 | 38 | 12 | 51 | 36 | 1.31 | inf | 1.000 |
19 | 3550.0020 | HSNIKIIRIDEFRRYG | 1 | DPA1*01:03 | 1 | 0 | 38 | 12 | 51 | 36 | 1.31 | inf | 1.000 |
20 | 3550.0006 | AAVLRFQEAANKQKQ | 1 | DPA1*01:03 | 3 | 0 | 38 | 12 | 53 | 34 | 1.29 | inf | 1.000 |
21 | 3536.0170 | THSWEYWGAQLNAMK | 1 | DPA1*01:03 | 1 | 0 | 41 | 11 | 53 | 34 | 1.26 | inf | 1.000 |
22 | 3536.0147 | AGSLSALLDPSQGMG | 1 | DPA1*01:03 | 2 | 1 | 41 | 12 | 56 | 31 | 0.87 | 0.59 | 0.555 |
23 | 3536.0144 | SAMILAAYHPQQFIY | 1 | DPA1*01:03 | 1 | 0 | 42 | 13 | 56 | 31 | 1.30 | inf | 1.000 |
24 | 3536.0139 | PQWLSANRAVKPTGS | 1 | DPA1*01:03 | 1 | 0 | 42 | 13 | 56 | 31 | 1.30 | inf | 1.000 |
25 | 3536.0138 | LTSELPQWLSANRAV | 1 | DPA1*01:03 | 1 | 0 | 42 | 13 | 56 | 31 | 1.30 | inf | 1.000 |
26 | 3536.0136 | GCQTYKWETFLTSEL | 1 | DPA1*01:03 | 1 | 0 | 42 | 13 | 56 | 31 | 1.30 | inf | 1.000 |
27 | 3536.0133 | SSFYSDWYSPACGKA | 1 | DPA1*01:03 | 1 | 0 | 42 | 13 | 56 | 31 | 1.30 | inf | 1.000 |
28 | 3536.0132 | PVGGQSSFYSDWYSP | 1 | DPA1*01:03 | 1 | 0 | 42 | 13 | 56 | 31 | 1.30 | inf | 1.000 |
29 | 3536.0131 | LSIVMPVGGQSSFYS | 1 | DPA1*01:03 | 1 | 0 | 42 | 13 | 56 | 31 | 1.30 | inf | 1.000 |
30 | 3536.0121 | PSMGRDIKVQFQSGG | 1 | DPA1*01:03 | 1 | 0 | 42 | 13 | 56 | 31 | 1.30 | inf | 1.000 |
31 | 3536.0109 | ALGATPNTGPAPQGA | 1 | DPA1*01:03 | 1 | 0 | 40 | 12 | 53 | 34 | 1.29 | inf | 1.000 |
32 | 3536.0102 | GGHNGVFDFPDSGTH | 1 | DPA1*01:03 | 1 | 0 | 43 | 14 | 58 | 29 | 1.32 | inf | 1.000 |
33 | 3536.0071 | QTYKWETFLTSELPG | 1 | DPA1*01:03 | 1 | 0 | 42 | 14 | 57 | 30 | 1.33 | inf | 1.000 |
Peptide No. . | Peptide ID . | Peptide Sequence . | Allele No. . | Allele . | A+R+ . | A−R+ . | A+R− . | A−R− . | No. of Donors . | Response n/a . | RF . | OR . | p . |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 3531.0375 | MSQIMYNYPAMMAHA | 1 | DPA1*01:03 | 13 | 9 | 48 | 11 | 81 | 6 | 0.78 | 0.34 | 0.048 |
2 | 3531.0461 | AGCQTYKWETFLTSE | 1 | DPA1*01:03 | 7 | 1 | 54 | 19 | 81 | 6 | 1.16 | 2.44 | 0.672 |
3 | 3531.0511 | GEEYLILSARDVLAV | 1 | DPA1*01:03 | 6 | 0 | 59 | 22 | 87 | 0 | 1.34 | inf | 0.330 |
4 | 3550.0065 | STHEANTMAMMARDT | 1 | DPA1*01:03 | 3 | 1 | 34 | 9 | 47 | 40 | 0.95 | 0.80 | 1.000 |
5 | 3550.0063 | DLVRAYHSMSSTHEA | 1 | DPA1*01:03 | 5 | 0 | 32 | 10 | 47 | 40 | 1.27 | inf | 0.569 |
6 | 3550.0061 | DLVRAYHAMSSTHEA | 1 | DPA1*01:03 | 4 | 0 | 33 | 10 | 47 | 40 | 1.27 | inf | 0.564 |
7 | 3550.0060 | AMEDLVRAYHAMSST | 1 | DPA1*01:03 | 2 | 0 | 35 | 10 | 47 | 40 | 1.27 | inf | 1.000 |
8 | 3550.0059 | IMYNYPTMLGHAGDM | 1 | DPA1*01:03 | 4 | 1 | 33 | 9 | 47 | 40 | 1.02 | 1.09 | 1.000 |
9 | 3550.0058 | MSQIMYNYPTMLGHA | 1 | DPA1*01:03 | 4 | 1 | 33 | 9 | 47 | 40 | 1.02 | 1.09 | 1.000 |
10 | 3550.0057 | IMYNYPAMLGHAGDM | 1 | DPA1*01:03 | 7 | 1 | 30 | 9 | 47 | 40 | 1.11 | 2.07 | 0.667 |
11 | 3550.0056 | MSQIMYNYPAMLGHA | 1 | DPA1*01:03 | 7 | 2 | 30 | 8 | 47 | 40 | 0.99 | 0.93 | 1.000 |
12 | 3550.0055 | TEIRRSNAPRLVDLV | 1 | DPA1*01:03 | 1 | 0 | 36 | 10 | 47 | 40 | 1.27 | inf | 1.000 |
13 | 3550.0052 | GTEIRRSDAPRLVDL | 1 | DPA1*01:03 | 2 | 0 | 35 | 10 | 47 | 40 | 1.27 | inf | 1.000 |
14 | 3550.0051 | SNIKIIRIDEFRRCG | 1 | DPA1*01:03 | 1 | 0 | 36 | 10 | 47 | 40 | 1.27 | inf | 1.000 |
15 | 3550.0046 | HSNIKIIRIDEFRRY | 1 | DPA1*01:03 | 1 | 0 | 36 | 10 | 47 | 40 | 1.27 | inf | 1.000 |
16 | 3550.0028 | PYVIELDGQFCGQLT | 1 | DPA1*01:03 | 1 | 0 | 42 | 14 | 57 | 30 | 1.33 | inf | 1.000 |
17 | 3550.0026 | EWTVRHTVAAWPAVC | 1 | DPA1*01:03 | 1 | 0 | 42 | 14 | 57 | 30 | 1.33 | inf | 1.000 |
18 | 3550.0024 | GTEIRRSNAPRLVDLV | 1 | DPA1*01:03 | 1 | 0 | 38 | 12 | 51 | 36 | 1.31 | inf | 1.000 |
19 | 3550.0020 | HSNIKIIRIDEFRRYG | 1 | DPA1*01:03 | 1 | 0 | 38 | 12 | 51 | 36 | 1.31 | inf | 1.000 |
20 | 3550.0006 | AAVLRFQEAANKQKQ | 1 | DPA1*01:03 | 3 | 0 | 38 | 12 | 53 | 34 | 1.29 | inf | 1.000 |
21 | 3536.0170 | THSWEYWGAQLNAMK | 1 | DPA1*01:03 | 1 | 0 | 41 | 11 | 53 | 34 | 1.26 | inf | 1.000 |
22 | 3536.0147 | AGSLSALLDPSQGMG | 1 | DPA1*01:03 | 2 | 1 | 41 | 12 | 56 | 31 | 0.87 | 0.59 | 0.555 |
23 | 3536.0144 | SAMILAAYHPQQFIY | 1 | DPA1*01:03 | 1 | 0 | 42 | 13 | 56 | 31 | 1.30 | inf | 1.000 |
24 | 3536.0139 | PQWLSANRAVKPTGS | 1 | DPA1*01:03 | 1 | 0 | 42 | 13 | 56 | 31 | 1.30 | inf | 1.000 |
25 | 3536.0138 | LTSELPQWLSANRAV | 1 | DPA1*01:03 | 1 | 0 | 42 | 13 | 56 | 31 | 1.30 | inf | 1.000 |
26 | 3536.0136 | GCQTYKWETFLTSEL | 1 | DPA1*01:03 | 1 | 0 | 42 | 13 | 56 | 31 | 1.30 | inf | 1.000 |
27 | 3536.0133 | SSFYSDWYSPACGKA | 1 | DPA1*01:03 | 1 | 0 | 42 | 13 | 56 | 31 | 1.30 | inf | 1.000 |
28 | 3536.0132 | PVGGQSSFYSDWYSP | 1 | DPA1*01:03 | 1 | 0 | 42 | 13 | 56 | 31 | 1.30 | inf | 1.000 |
29 | 3536.0131 | LSIVMPVGGQSSFYS | 1 | DPA1*01:03 | 1 | 0 | 42 | 13 | 56 | 31 | 1.30 | inf | 1.000 |
30 | 3536.0121 | PSMGRDIKVQFQSGG | 1 | DPA1*01:03 | 1 | 0 | 42 | 13 | 56 | 31 | 1.30 | inf | 1.000 |
31 | 3536.0109 | ALGATPNTGPAPQGA | 1 | DPA1*01:03 | 1 | 0 | 40 | 12 | 53 | 34 | 1.29 | inf | 1.000 |
32 | 3536.0102 | GGHNGVFDFPDSGTH | 1 | DPA1*01:03 | 1 | 0 | 43 | 14 | 58 | 29 | 1.32 | inf | 1.000 |
33 | 3536.0071 | QTYKWETFLTSELPG | 1 | DPA1*01:03 | 1 | 0 | 42 | 14 | 57 | 30 | 1.33 | inf | 1.000 |
Table shows a part of the complete results obtained for a sample dataset.
inf, infinity; n/a, number of donors not tested with the specific peptide.
This complete report is usually too large and cumbersome to evaluate. For example, even considering only the 25 most common variants of the four polymorphic HLA class II molecules yields 100 different HLA molecules, and in the case of a set of 200 peptides, this generates an output with 20,000 entries. For this reason, the tool also generates a concise report (Table III) that lists, for each peptide, only RF values >2.0. This threshold is used because the determination of negative HLA association is not within the scope of the present application, and stronger RF (or OR) values are more likely to reflect HLA restrictions. HLA molecules that are associated with significant ORs and RFs and predicted to bind the corresponding epitope with high affinity (IEDB consensus prediction score less than the 15th percentile) (13) are considered as potential restrictions.
Peptide No. . | Peptide ID . | Peptide Sequence . | Allele No. . | Allele . | A+R+ . | A−R+ . | A+R− . | A−R− . | No. of Donors . | Response n/a . | RF . | OR . | p . |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 3531.0375 | MSQIMYNYPAMMAHA | 66 | DQB1*06:01 | 4 | 18 | 0 | 59 | 81 | 6 | 3.68 | inf | 0.004 |
1 | 3531.0375 | MSQIMYNYPAMMAHA | 101 | DRB1*15:02 | 4 | 18 | 0 | 59 | 81 | 6 | 3.68 | inf | 0.004 |
1 | 3531.0375 | MSQIMYNYPAMMAHA | 111 | DRB5*01:02 | 2 | 20 | 0 | 59 | 81 | 6 | 3.68 | inf | 0.071 |
1 | 3531.0375 | MSQIMYNYPAMMAHA | 3 | DPA1*01:05 | 1 | 21 | 0 | 59 | 81 | 6 | 3.68 | inf | 0.272 |
1 | 3531.0375 | MSQIMYNYPAMMAHA | 18 | DPB1*104:01 | 1 | 21 | 0 | 59 | 81 | 6 | 3.68 | inf | 0.272 |
1 | 3531.0375 | MSQIMYNYPAMMAHA | 31 | DPB1*27:02 | 1 | 21 | 0 | 59 | 81 | 6 | 3.68 | inf | 0.272 |
1 | 3531.0375 | MSQIMYNYPAMMAHA | 33 | DPB1*40:01 | 1 | 21 | 0 | 59 | 81 | 6 | 3.68 | inf | 0.272 |
1 | 3531.0375 | MSQIMYNYPAMMAHA | 43 | DQA1*03 | 1 | 21 | 0 | 59 | 81 | 6 | 3.68 | inf | 0.272 |
1 | 3531.0375 | MSQIMYNYPAMMAHA | 58 | DQB1*03:08 | 1 | 21 | 0 | 59 | 81 | 6 | 3.68 | inf | 0.272 |
1 | 3531.0375 | MSQIMYNYPAMMAHA | 76 | DRB1*04:02 | 1 | 21 | 0 | 59 | 81 | 6 | 3.68 | inf | 0.272 |
1 | 3531.0375 | MSQIMYNYPAMMAHA | 81 | DRB1*04:10 | 1 | 21 | 0 | 59 | 81 | 6 | 3.68 | inf | 0.272 |
1 | 3531.0375 | MSQIMYNYPAMMAHA | 87 | DRB1*08:06 | 1 | 21 | 0 | 59 | 81 | 6 | 3.68 | inf | 0.272 |
1 | 3531.0375 | MSQIMYNYPAMMAHA | 100 | DRB1*15:01 | 11 | 11 | 1 | 58 | 81 | 6 | 3.38 | 53.97 | 0.000 |
1 | 3531.0375 | MSQIMYNYPAMMAHA | 110 | DRB5*01:01 | 14 | 8 | 5 | 54 | 81 | 6 | 2.71 | 17.84 | 0.000 |
1 | 3531.0375 | MSQIMYNYPAMMAHA | 25 | DPB1*14:01 | 2 | 20 | 1 | 58 | 81 | 6 | 2.45 | 5.65 | 0.178 |
1 | 3531.0375 | MSQIMYNYPAMMAHA | 62 | DQB1*05:02 | 2 | 20 | 1 | 58 | 81 | 6 | 2.45 | 5.65 | 0.178 |
1 | 3531.0375 | MSQIMYNYPAMMAHA | 84 | DRB1*08:02 | 2 | 20 | 1 | 58 | 81 | 6 | 2.45 | 5.65 | 0.178 |
1 | 3531.0375 | MSQIMYNYPAMMAHA | 39 | DQA1*01:03 | 5 | 17 | 4 | 55 | 81 | 6 | 2.05 | 3.96 | 0.056 |
2 | 3531.0461 | AGCQTYKWETFLTSE | 43 | DQA1*03 | 1 | 7 | 0 | 73 | 81 | 6 | 10.13 | inf | 0.099 |
2 | 3531.0461 | AGCQTYKWETFLTSE | 76 | DRB1*04:02 | 1 | 7 | 0 | 73 | 81 | 6 | 10.13 | inf | 0.099 |
2 | 3531.0461 | AGCQTYKWETFLTSE | 84 | DRB1*08:02 | 2 | 6 | 1 | 72 | 81 | 6 | 6.75 | 21.84 | 0.025 |
2 | 3531.0461 | AGCQTYKWETFLTSE | 111 | DRB5*01:02 | 1 | 7 | 1 | 72 | 81 | 6 | 5.06 | 9.69 | 0.189 |
2 | 3531.0461 | AGCQTYKWETFLTSE | 62 | DQB1*05:02 | 1 | 7 | 2 | 71 | 81 | 6 | 3.38 | 4.90 | 0.271 |
2 | 3531.0461 | AGCQTYKWETFLTSE | 66 | DQB1*06:01 | 1 | 7 | 2 | 71 | 81 | 6 | 3.38 | 4.90 | 0.271 |
2 | 3531.0461 | AGCQTYKWETFLTSE | 86 | DRB1*08:04 | 1 | 7 | 2 | 71 | 81 | 6 | 3.38 | 4.90 | 0.271 |
2 | 3531.0461 | AGCQTYKWETFLTSE | 101 | DRB1*15:02 | 1 | 7 | 2 | 71 | 81 | 6 | 3.38 | 4.90 | 0.271 |
2 | 3531.0461 | AGCQTYKWETFLTSE | 71 | DRB1*01:01 | 3 | 5 | 7 | 66 | 81 | 6 | 3.04 | 5.47 | 0.055 |
2 | 3531.0461 | AGCQTYKWETFLTSE | 60 | DQB1*04:02 | 2 | 6 | 5 | 68 | 81 | 6 | 2.89 | 4.40 | 0.140 |
2 | 3531.0461 | AGCQTYKWETFLTSE | 47 | DQA1*04:01 | 3 | 5 | 8 | 65 | 81 | 6 | 2.76 | 4.73 | 0.072 |
2 | 3531.0461 | AGCQTYKWETFLTSE | 13 | DPB1*04:01 | 6 | 2 | 17 | 56 | 81 | 6 | 2.64 | 9.54 | 0.006 |
2 | 3531.0461 | AGCQTYKWETFLTSE | 68 | DQB1*06:03 | 1 | 7 | 3 | 70 | 81 | 6 | 2.53 | 3.26 | 0.346 |
2 | 3531.0461 | AGCQTYKWETFLTSE | 14 | DPB1*04:02 | 3 | 5 | 10 | 63 | 81 | 6 | 2.34 | 3.70 | 0.113 |
2 | 3531.0461 | AGCQTYKWETFLTSE | 37 | DQA1*01:01 | 2 | 6 | 7 | 66 | 81 | 6 | 2.25 | 3.08 | 0.216 |
Peptide No. . | Peptide ID . | Peptide Sequence . | Allele No. . | Allele . | A+R+ . | A−R+ . | A+R− . | A−R− . | No. of Donors . | Response n/a . | RF . | OR . | p . |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 3531.0375 | MSQIMYNYPAMMAHA | 66 | DQB1*06:01 | 4 | 18 | 0 | 59 | 81 | 6 | 3.68 | inf | 0.004 |
1 | 3531.0375 | MSQIMYNYPAMMAHA | 101 | DRB1*15:02 | 4 | 18 | 0 | 59 | 81 | 6 | 3.68 | inf | 0.004 |
1 | 3531.0375 | MSQIMYNYPAMMAHA | 111 | DRB5*01:02 | 2 | 20 | 0 | 59 | 81 | 6 | 3.68 | inf | 0.071 |
1 | 3531.0375 | MSQIMYNYPAMMAHA | 3 | DPA1*01:05 | 1 | 21 | 0 | 59 | 81 | 6 | 3.68 | inf | 0.272 |
1 | 3531.0375 | MSQIMYNYPAMMAHA | 18 | DPB1*104:01 | 1 | 21 | 0 | 59 | 81 | 6 | 3.68 | inf | 0.272 |
1 | 3531.0375 | MSQIMYNYPAMMAHA | 31 | DPB1*27:02 | 1 | 21 | 0 | 59 | 81 | 6 | 3.68 | inf | 0.272 |
1 | 3531.0375 | MSQIMYNYPAMMAHA | 33 | DPB1*40:01 | 1 | 21 | 0 | 59 | 81 | 6 | 3.68 | inf | 0.272 |
1 | 3531.0375 | MSQIMYNYPAMMAHA | 43 | DQA1*03 | 1 | 21 | 0 | 59 | 81 | 6 | 3.68 | inf | 0.272 |
1 | 3531.0375 | MSQIMYNYPAMMAHA | 58 | DQB1*03:08 | 1 | 21 | 0 | 59 | 81 | 6 | 3.68 | inf | 0.272 |
1 | 3531.0375 | MSQIMYNYPAMMAHA | 76 | DRB1*04:02 | 1 | 21 | 0 | 59 | 81 | 6 | 3.68 | inf | 0.272 |
1 | 3531.0375 | MSQIMYNYPAMMAHA | 81 | DRB1*04:10 | 1 | 21 | 0 | 59 | 81 | 6 | 3.68 | inf | 0.272 |
1 | 3531.0375 | MSQIMYNYPAMMAHA | 87 | DRB1*08:06 | 1 | 21 | 0 | 59 | 81 | 6 | 3.68 | inf | 0.272 |
1 | 3531.0375 | MSQIMYNYPAMMAHA | 100 | DRB1*15:01 | 11 | 11 | 1 | 58 | 81 | 6 | 3.38 | 53.97 | 0.000 |
1 | 3531.0375 | MSQIMYNYPAMMAHA | 110 | DRB5*01:01 | 14 | 8 | 5 | 54 | 81 | 6 | 2.71 | 17.84 | 0.000 |
1 | 3531.0375 | MSQIMYNYPAMMAHA | 25 | DPB1*14:01 | 2 | 20 | 1 | 58 | 81 | 6 | 2.45 | 5.65 | 0.178 |
1 | 3531.0375 | MSQIMYNYPAMMAHA | 62 | DQB1*05:02 | 2 | 20 | 1 | 58 | 81 | 6 | 2.45 | 5.65 | 0.178 |
1 | 3531.0375 | MSQIMYNYPAMMAHA | 84 | DRB1*08:02 | 2 | 20 | 1 | 58 | 81 | 6 | 2.45 | 5.65 | 0.178 |
1 | 3531.0375 | MSQIMYNYPAMMAHA | 39 | DQA1*01:03 | 5 | 17 | 4 | 55 | 81 | 6 | 2.05 | 3.96 | 0.056 |
2 | 3531.0461 | AGCQTYKWETFLTSE | 43 | DQA1*03 | 1 | 7 | 0 | 73 | 81 | 6 | 10.13 | inf | 0.099 |
2 | 3531.0461 | AGCQTYKWETFLTSE | 76 | DRB1*04:02 | 1 | 7 | 0 | 73 | 81 | 6 | 10.13 | inf | 0.099 |
2 | 3531.0461 | AGCQTYKWETFLTSE | 84 | DRB1*08:02 | 2 | 6 | 1 | 72 | 81 | 6 | 6.75 | 21.84 | 0.025 |
2 | 3531.0461 | AGCQTYKWETFLTSE | 111 | DRB5*01:02 | 1 | 7 | 1 | 72 | 81 | 6 | 5.06 | 9.69 | 0.189 |
2 | 3531.0461 | AGCQTYKWETFLTSE | 62 | DQB1*05:02 | 1 | 7 | 2 | 71 | 81 | 6 | 3.38 | 4.90 | 0.271 |
2 | 3531.0461 | AGCQTYKWETFLTSE | 66 | DQB1*06:01 | 1 | 7 | 2 | 71 | 81 | 6 | 3.38 | 4.90 | 0.271 |
2 | 3531.0461 | AGCQTYKWETFLTSE | 86 | DRB1*08:04 | 1 | 7 | 2 | 71 | 81 | 6 | 3.38 | 4.90 | 0.271 |
2 | 3531.0461 | AGCQTYKWETFLTSE | 101 | DRB1*15:02 | 1 | 7 | 2 | 71 | 81 | 6 | 3.38 | 4.90 | 0.271 |
2 | 3531.0461 | AGCQTYKWETFLTSE | 71 | DRB1*01:01 | 3 | 5 | 7 | 66 | 81 | 6 | 3.04 | 5.47 | 0.055 |
2 | 3531.0461 | AGCQTYKWETFLTSE | 60 | DQB1*04:02 | 2 | 6 | 5 | 68 | 81 | 6 | 2.89 | 4.40 | 0.140 |
2 | 3531.0461 | AGCQTYKWETFLTSE | 47 | DQA1*04:01 | 3 | 5 | 8 | 65 | 81 | 6 | 2.76 | 4.73 | 0.072 |
2 | 3531.0461 | AGCQTYKWETFLTSE | 13 | DPB1*04:01 | 6 | 2 | 17 | 56 | 81 | 6 | 2.64 | 9.54 | 0.006 |
2 | 3531.0461 | AGCQTYKWETFLTSE | 68 | DQB1*06:03 | 1 | 7 | 3 | 70 | 81 | 6 | 2.53 | 3.26 | 0.346 |
2 | 3531.0461 | AGCQTYKWETFLTSE | 14 | DPB1*04:02 | 3 | 5 | 10 | 63 | 81 | 6 | 2.34 | 3.70 | 0.113 |
2 | 3531.0461 | AGCQTYKWETFLTSE | 37 | DQA1*01:01 | 2 | 6 | 7 | 66 | 81 | 6 | 2.25 | 3.08 | 0.216 |
The table shows a part of the concise results obtained for a sample dataset.
inf, infinity; n/a, number of donors not tested with the specific peptide.
RATE predicts previously validated restrictions
To validate the approach, we examined whether RATE would successfully reidentify HLA class II restrictions experimentally determined in previous studies (16). The three immune response and HLA allele–typing data from previously validated epitope–HLA restrictions were analyzed using the program (16). As shown in Table IV, donors expressing HLA DRB1*15:01 accounted for 11 of 22 responders for M. tuberculosis Rv0288/Rv3019c epitope MSQIMYNYPAMMAHA, with an OR of 54.0 and an RF of 3.4 (p < 0.001). The M. tuberculosis Rv3804c epitope AGCQTYKWETFLTSE and Rv3418c epitope GEEYLILSARDVLAV were predicted to be restricted by DPB1*04:01 (p = 0.0056) and DRB1*01:01 (p = 0.0249), respectively (Table IV). All three restrictions had been previously validated by peptide-tetramer staining of PBMCs from HLA-matched donors (16). Thus, RATE correctly reidentified known HLA-epitope restrictions.
Sequence . | Allele . | A+R+ . | A−R+ . | A+R− . | A−R− . | RF . | OR . | pa . |
---|---|---|---|---|---|---|---|---|
MSQIMYNYPAMMAHA | DRB1*15:01 | 11 | 11 | 1 | 58 | 3.4 | 54.0 | <0.001 |
AGCQTYKWETFLTSE | DPB1*04:01 | 6 | 2 | 17 | 56 | 2.6 | 9.5 | 0.006 |
GEEYLILSARDVLAV | DRB1*01:01 | 3 | 3 | 8 | 73 | 4.0 | 8.7 | 0.025 |
Sequence . | Allele . | A+R+ . | A−R+ . | A+R− . | A−R− . | RF . | OR . | pa . |
---|---|---|---|---|---|---|---|---|
MSQIMYNYPAMMAHA | DRB1*15:01 | 11 | 11 | 1 | 58 | 3.4 | 54.0 | <0.001 |
AGCQTYKWETFLTSE | DPB1*04:01 | 6 | 2 | 17 | 56 | 2.6 | 9.5 | 0.006 |
GEEYLILSARDVLAV | DRB1*01:01 | 3 | 3 | 8 | 73 | 4.0 | 8.7 | 0.025 |
The response data and HLA typing from three previously validated tetramers (16) were matched using the RATE program.
p values calculated by Fisher’s exact test.
A+, genotyped HLA allele positive; A−, genotyped HLA allele negative; R+, epitope response positive; R−, epitope response negative.
Discovery of novel HLA-peptide restrictions
To further validate the use of RATE to predict novel restrictions, we considered a dataset generated by testing various M. tuberculosis–derived epitopes as described previously (16) and by D.M. McKinney, C.S. Lindestam Arlehamn, V. Rozot, E. Makgotlho, W. Hanekom, T.J. Scriba, and A. Sette (manuscript in preparation). In brief, immune responses to various M. tuberculosis epitopes in PBMCs from individuals with LTBI were measured by IFN-γ–specific ELISPOT as representative of Th1 responses and reported as SFCs per million cells. These data were used to generate RATE predicted restrictions.
In parallel, the HLA class II restrictions of five selected M. tuberculosis–derived epitopes (Rv1705c; sequence FFGQNTAAIAATEAQ, Rv1195 epitope SSYAATEVANAAAGQ, Rv0288 epitope IMYNYPAMLGHAGDM, and Rv3874 epitopes AAVVRFQEAANKQKQ and AQAAVVRFQEAANKQ) were determined by single HLA-transfected cell lines (6). PBMCs from LTBI were incubated together with a panel of cell lines presenting the specific epitopes and expressing HLA molecules matching those expressed in the donors (Fig. 1). Responses were evaluated in a standard IFN-γ ELISPOT assay to measure Th1 responses. In the case of LTBI donors, the strong CD4+ T cell responses allow detection of IFN-γ responses directly ex vivo (16).
Novel HLA-epitope restrictions predicted by RATE. PBMCs were incubated with single HLA-transfected cells pulsed with (A) Rv1705c epitope FFGQNTAAIAATEAQ, (B) Rv1195 epitope SSYAATEVANAAAGQ, (C) Rv0288 epitope IMYNYPAMLGHAGDM, or (D) Rv3874 epitopes AAVVRFQEAANKQKQ and (E) AQAAVVRFQEAANKQ for 24 h. IFN-γ release was measured by ELISPOT. White bars show significant responses (p < 0.05), whereas the black bars represent nonsignificant responses. HLA alleles expressed by donor are presented in the table insert. Predicted binding from IEDB is listed below each allele tested. RFs and Fisher’s exact test p values (when significant) are listed for predicted restrictions. N/A, HLA-transfected cell lines that are not available.
Novel HLA-epitope restrictions predicted by RATE. PBMCs were incubated with single HLA-transfected cells pulsed with (A) Rv1705c epitope FFGQNTAAIAATEAQ, (B) Rv1195 epitope SSYAATEVANAAAGQ, (C) Rv0288 epitope IMYNYPAMLGHAGDM, or (D) Rv3874 epitopes AAVVRFQEAANKQKQ and (E) AQAAVVRFQEAANKQ for 24 h. IFN-γ release was measured by ELISPOT. White bars show significant responses (p < 0.05), whereas the black bars represent nonsignificant responses. HLA alleles expressed by donor are presented in the table insert. Predicted binding from IEDB is listed below each allele tested. RFs and Fisher’s exact test p values (when significant) are listed for predicted restrictions. N/A, HLA-transfected cell lines that are not available.
As can be seen in Fig. 1A, a significant response to the peptide was observed in the case of DRB3*02:02-transfected cells, but not for any of the other lines transfected with HLA molecules expressed by the donor. These results match those obtained with the RATE program, where DRB3*02:02 was associated with an RF of 2.4 (p = 0.004) for the FFGQNTAAIAATEAQ epitope. Similar patterns of restrictions and RATE predictions were observed in the four additional epitopes tested (Fig. 1B–E). Thus, the RATE approach correctly predicted five new HLA restrictions.
Use of RATE to guide generation of HLA tetramers
As a next step toward validation, we tested whether RATE could predict HLA-peptide restrictions de novo as a means to guide generation of specific tetrameric staining reagents. Tetramer staining was used as an alternative method of validating restrictions to the earlier transfection assay technique to show the versatility of the RATE predictions. In an initial set of experiments, we used the immune response dataset from a study of 31 HLA class II–typed donors vaccinated with the acellular B. pertussis vaccine (M.B.C. Dillon, T.A. Bancroft, R. Kolla, S. Paul, J. Sidney, B. Peters, and A. Sette, manuscript in preparation). In this cohort, little to no reactivity was detected ex vivo, but good T cell reactivity was detected after expansion in vitro with the vaccine component proteins. After expansion, epitopes were defined using a set of 785 overlapping peptides (16 mer overlapping by 8) completely spanning the component proteins of which 154 nonredundant epitopes induced positive Th1 and/or Th2 CD4+ T cell responses. Of those, RATE indicated potential restrictions for 35 of them.
More specifically, when the combined IFN-γ and IL-5 CD4+ T cell immune response reactivity was analyzed by RATE, epitope YYSNVTATRLLSSTNS from pertussis toxin subunit B129–144 was found to be associated with significant OR values (p < 0.05; Table V) for DRB1*07:01. Accordingly, the corresponding PE-conjugated tetramer was generated and staining was measured on PBMCs (Fig. 2). As expected, little staining was detected directly ex vivo, with positive staining for YYSNVTATRLLSSTNS of <0.01% (Fig. 2A). However, after 14 d of stimulation with the corresponding peptide, the tetramer displayed significant binding to CD3+CD4+CD8− cells, representing an increase of >20-fold compared with ex vivo staining (0.42%; Fig. 2B). The staining was specific because, as expected, no significant staining was detected on CD3+CD4−CD8+ cells from the same expanded PBMCs cultures (Fig. 2C).
Sequence . | Allele . | A+R+ . | A−R+ . | A+R− . | A−R− . | RF . | pa . | Predicted Bindingb (Percentile) . |
---|---|---|---|---|---|---|---|---|
YYSNVTATRLLSSTNS | DRB1*07:01 | 3 | 1 | 5 | 22 | 2.9 | 0.043 | 0.76 |
AAFQAAHARFVAAAA | DRB1*07:01 | 7 | 12 | 4 | 58 | 2.7 | 0.003 | 0.03 |
MSQIMYNYPAMRAHA | DRB1*15:01 | 11 | 11 | 1 | 58 | 3.4 | 0.000 | 0.90 |
AAFQGAHARFVAAAA | DRB1*07:01 | 8 | 14 | 3 | 56 | 2.7 | 0.001 | 0.13 |
Sequence . | Allele . | A+R+ . | A−R+ . | A+R− . | A−R− . | RF . | pa . | Predicted Bindingb (Percentile) . |
---|---|---|---|---|---|---|---|---|
YYSNVTATRLLSSTNS | DRB1*07:01 | 3 | 1 | 5 | 22 | 2.9 | 0.043 | 0.76 |
AAFQAAHARFVAAAA | DRB1*07:01 | 7 | 12 | 4 | 58 | 2.7 | 0.003 | 0.03 |
MSQIMYNYPAMRAHA | DRB1*15:01 | 11 | 11 | 1 | 58 | 3.4 | 0.000 | 0.90 |
AAFQGAHARFVAAAA | DRB1*07:01 | 8 | 14 | 3 | 56 | 2.7 | 0.001 | 0.13 |
The response data and HLA typing from donors recently vaccinated with acellular B. pertussis or with LTBI were matched using the RATE program.
p values calculated by Fisher’s exact test.
Predicted binding values were obtained from IEDB (13).
A+, genotyped HLA allele positive; A−, genotyped HLA allele negative; R+, epitope response positive; R−, epitope response negative.
Novel HLA-epitope restrictions validated by tetramer binding. (A) Purified CD4+CD3+CD8− cells stained with YYSNVTARTLLSSTNS tetramer-PE. (B) CD3+CD4+CD8− PBMCs stained with YYSNVTARTLLSSTNS tetramer-PE after 14 d of stimulation with peptide. (C) CD3+CD4−CD8+ PBMCs stained with YYSNVTARTLLSSTNS tetramer-PE after 14 d of stimulation with peptide. (D) Top panels present CD3+CD8−CD19−CD11b−CD56− PBMCs stained with the indicated tetramer-PE combinations after anti-PE magnetic bead enrichment. Bottom panels present flow-through from magnetic bead enrichment. Numbers shown are percentages of tetramer+ CD4+ or CD8+ cells.
Novel HLA-epitope restrictions validated by tetramer binding. (A) Purified CD4+CD3+CD8− cells stained with YYSNVTARTLLSSTNS tetramer-PE. (B) CD3+CD4+CD8− PBMCs stained with YYSNVTARTLLSSTNS tetramer-PE after 14 d of stimulation with peptide. (C) CD3+CD4−CD8+ PBMCs stained with YYSNVTARTLLSSTNS tetramer-PE after 14 d of stimulation with peptide. (D) Top panels present CD3+CD8−CD19−CD11b−CD56− PBMCs stained with the indicated tetramer-PE combinations after anti-PE magnetic bead enrichment. Bottom panels present flow-through from magnetic bead enrichment. Numbers shown are percentages of tetramer+ CD4+ or CD8+ cells.
RATE predictions to generate M. tuberculosis–specific tetramers
To expand the results obtained with the B. pertussis epitope, we examined additional instances of RATE predicted restrictions, in the context of the M. tuberculosis response rates in LTBI donors (as described earlier in the case of Fig. 3). In the case of LTBI and M. tuberculosis epitopes, as stated earlier, the strong CD4+ T cell responses allow detection of IFN-γ responses directly ex vivo (16).
Predicted promiscuous binding epitope validated by tetramer binding. (A) Purified CD4+CD3+CD8− cells stained with VKAQNITNKRAALIEA tetramer-PE. (B) CD3+CD4+CD8− PBMCs stained with VKAQNITNKRAALIEA tetramer-PE after 14 d of stimulation with peptide. (C) CD3+CD4−CD8+ PBMCs stained with VKAQNITNKRAALIEA tetramer-PE after 14 d of stimulation with peptide. Numbers shown are percentages of tetramer+ CD4+ or CD8+ cells.
Predicted promiscuous binding epitope validated by tetramer binding. (A) Purified CD4+CD3+CD8− cells stained with VKAQNITNKRAALIEA tetramer-PE. (B) CD3+CD4+CD8− PBMCs stained with VKAQNITNKRAALIEA tetramer-PE after 14 d of stimulation with peptide. (C) CD3+CD4−CD8+ PBMCs stained with VKAQNITNKRAALIEA tetramer-PE after 14 d of stimulation with peptide. Numbers shown are percentages of tetramer+ CD4+ or CD8+ cells.
Three different instances of restrictions predicted from the RATE approach were selected for further investigation, as shown in Table V. More specifically, Rv0287 epitope AAFQAAHARFVAAAA and Rv3020c epitope AAFQGAHARFVAAAA were predicted to be restricted to DRB1*07:01 (p = 0.003 and 0.001, respectively) and Rv3019c epitope MSQIMYNYPAMRAHA to DRB1*15:01 (p = 0.000). Accordingly, PBMCs from epitope-responsive LTBI subjects were stained ex vivo with the respective tetramer and enriched with anti-PE magnetic beads. In all three epitope–allele combinations, tetramer staining was detected at 13- to 160-fold higher percentages than those detected in the negative control flow-through from the magnetic bead enrichment (Fig. 2D).
These results further validate the use of the RATE approach to predict HLA restrictions for the purpose of generating functional tetrameric staining reagents. This method thus allows to rapidly transition from HLA typing and response data to tetramers, essentially skipping the usual HLA association determination steps.
Combined RATE calculations to identify promiscuous restrictions
Many HLA allelic variants are functionally similar (24–26). As a result, a given epitope may be restricted by multiple HLA molecules encoded by a particular locus (especially if they are close variants), or even molecules from different loci (promiscuous restriction). In these cases, the fact that multiple HLA molecules may restrict the response to a single epitope will (paradoxically) lower the statistical significance of each individual HLA restriction. To overcome this issue, we have developed an algorithm that calculates for a given peptide OR and RF values of all possible combination of alleles for which predicted binding is within the 15th percentile, and tabulates particular combinations of alleles associated with the best p value.
To validate the approach, we selected the epitope EEWEPLTKKGNVWEV from Phlp341–55, which was previously (15) determined to be promiscuously restricted by the two HLA alleles, DRB1*08:01 and DRB1*11:01, as determined by single HLA-transfected cell lines. Indeed, when the reactivity of EEWEPLTKKGNVWEV in the cohort of allergic individuals (15) was analyzed by RATE, of 14 HLA alleles (including DRB1*08:01 and DRB1*11:01) expressed by donors responsive to the epitope, 8 combinations had RF values >1.5, but none was associated with a significant p value (p > 0.05).
When the results were analyzed by the promiscuous restrictions algorithm, it was found that the allele combination of DRB1*08:01 and DRB1*11:01 was associated with a significant p value (p < 0.05 and RF = 3.1; Table VI). We thus concluded that RATE was able to correctly predict promiscuous restriction of this previously described example of promiscuous restriction.
Allele(s) . | A+R+ . | A−R+ . | A+R− . | A−R− . | RF . | pa . | Predicted Binding (Percentile)b . |
---|---|---|---|---|---|---|---|
DRB1*08:01 | 1 | 2 | 0 | 22 | 8.3 | 0.120 | 2.58 |
DRB1*11:01 | 2 | 1 | 5 | 17 | 2.4 | 0.180 | 3.99 |
DRB1*08:01 + DRB1*11:01 | 3 | 0 | 5 | 17 | 3.1 | 0.024 | n/ac |
Allele(s) . | A+R+ . | A−R+ . | A+R− . | A−R− . | RF . | pa . | Predicted Binding (Percentile)b . |
---|---|---|---|---|---|---|---|
DRB1*08:01 | 1 | 2 | 0 | 22 | 8.3 | 0.120 | 2.58 |
DRB1*11:01 | 2 | 1 | 5 | 17 | 2.4 | 0.180 | 3.99 |
DRB1*08:01 + DRB1*11:01 | 3 | 0 | 5 | 17 | 3.1 | 0.024 | n/ac |
The response data and HLA typing from Timothy grass–allergic donors (19) were matched using the RATE program promiscuity algorithm. The combined predictions are shown in the bottom row.
p values calculated by Fisher’s exact test.
Predicted binding values were obtained from IEDB (13).
Predicted binding percentiles are not available for combined alleled.
A+, genotyped HLA allele positive; A−, genotyped HLA allele negative; R+, epitope response positive; R−, epitope response negative.
Use of RATE to de novo identify promiscuous HLA restrictions
To further validate that the RATE algorithm could also predict novel promiscuous restrictions, we selected the epitope VKAQNITNKRAALIEA from filamentous hemagglutinin1753–1768. When the reactivity of this epitope in the cohort of individuals vaccinated with the acellular B. pertussis vaccine (M.B.C. Dillon, T.A. Bancroft, R. Kolla, S. Paul, J. Sidney, B. Peters, and A. Sette, manuscript in preparation) was analyzed by RATE, no alleles were predicted as potential restrictions (p > 0.05; Table VII). However, when analyzed by the promiscuous restriction algorithm, the allele combination of DQB1*06:02 and DRB1*14:04 was associated with a significant p value (p < 0.05 and RF ≥ 2.6; Table VII), as the predicted binding of the epitope for DQA1*01:02/DQB1*06:02 is 9.98 percentile and for DRB1*14:04 is 9.92 percentile.
Allele(s) . | A+R+ . | A−R+ . | A+R− . | A−R− . | RF . | pa . | Predicted Bindingb (Percentile) . |
---|---|---|---|---|---|---|---|
DQB1*06:02 | 2 | 1 | 5 | 23 | 3.0 | 0.120 | 9.98 |
DRB1*14:04 | 1 | 2 | 0 | 28 | 10.3 | 0.097 | 9.92 |
DQB1*06:02 + DRB1*14:04 | 3 | 0 | 5 | 23 | 3.9 | 0.012 | n/ac |
Allele(s) . | A+R+ . | A−R+ . | A+R− . | A−R− . | RF . | pa . | Predicted Bindingb (Percentile) . |
---|---|---|---|---|---|---|---|
DQB1*06:02 | 2 | 1 | 5 | 23 | 3.0 | 0.120 | 9.98 |
DRB1*14:04 | 1 | 2 | 0 | 28 | 10.3 | 0.097 | 9.92 |
DQB1*06:02 + DRB1*14:04 | 3 | 0 | 5 | 23 | 3.9 | 0.012 | n/ac |
The response data and HLA typing from donors recently vaccinated with acellular B. pertussis were matched using the RATE program promiscuity algorithm. The combined predictions are shown in the bottom row.
p values calculated by Fisher’s exact test.
Predicted binding values were obtained from IEDB (13).
Predicted binding percentiles are not available for combined alleled.
A+, genotyped HLA allele positive; A−, genotyped HLA allele negative; R+, epitope response positive; R−, epitope response negative.
To validate this predicted restriction, we developed a DQB1*06:02/KAQNITNKRAALIEA tetramer. To match the most common α/β combination in the population, we selected DQA1*01:02 for the α-chain. The staining of the PE-conjugated tetramer of VKAQNITNKRAALIEA with DQB1*06:02 was measured on PBMCs from a responsive donor (Fig. 3). As earlier with YYSNVTATRLLSSTNS and DRB1*07:01, limited staining was detected ex vivo (Fig. 3A). In contrast, 14 d of stimulation with the corresponding peptide dramatically increased the tetramer binding to CD3+CD4+CD8− cells, changing the ex vivo percentage of 0.24–6.85% in the stimulated cells (Fig. 3B). Specificity of the stain was confirmed by low staining in CD3+CD4−CD8+ cells of 0.19% (Fig. 3C). The ideal validation for promiscuous restriction of this epitope would also include the corresponding DRB1*14:04 tetramer stain. Unfortunately, DRB1*14:04 is not available as a tetramer at this time and a binding assay for this allele has not been developed as yet. However, the promiscuous algorithm of RATE does take into account the in silico predicted binding to this allele. In conclusion, the promiscuous RATE algorithm allowed identification of additional restrictions that could not be detected statistically when considered alone, which was experimentally verified where production of tetramer reagents was technically feasible.
Validation using class I allele and response data
Although the RATE tool was designed with a focus on class II alleles, we speculate that it might also be applicable to the determination of HLA class I restrictions. Datasets related to epitopes of known restriction tested in groups of HLA-typed donors is not generally available in our laboratory, because we and most other groups routinely infer HLA class I restriction on the basis of presence of specific motifs and HLA binding, and only test HLA-matched donors for reactivity.
However, analysis of a dataset on class I alleles obtained from the HIV Molecular Immunology Database by the Los Alamos National Laboratory (27) was able to address the applicability to HLA class I restrictions. The dataset (Study 4 in HLA Typing and Epitope Mapping section) (28), the largest among the available studies, contained class I HLA typing data for 631 HIV patients and the reaction data (SFC values) for 409 HIV-1 peptides in each of the respective positive patients (patients who gave a positive immune response to the peptide). The HIV database provides “A list of HIV CTL epitopes,” which lists the best defined CTL/CD8+ epitopes and the restricting class I alleles for each epitope.
There were 118 A list epitopes embedded in 108 peptides in the dataset, and the data from these 108 peptides and the HLA typing of the 631 patients were used to validate the RATE tool. The 118 epitopes (embedded in 108 peptides) were restricted by 67 class I alleles, and 45 of them were expressed by the patients in the study. These 45 alleles restricted 86 (of 108) peptides in the dataset. The results from the RATE tool showed 33 peptide–allele combinations where the allele was relatively frequent among the patients (present in ≥10% of patients) and expressed in at least one positive donor (A+R+ ≥ 1; Table VIII). Twenty-seven peptide–allele restrictions out of the earlier 33 combinations were significant hits (p ≤ 0.05); that is, the RATE tool could confirm 82% of the relevant peptide–allele combinations as restrictions.
Peptide Sequence . | Allele . | A+R+ . | A−R+ . | A+R− . | A−R− . | RF . | OR . | p . | Allele Frequency (%) . | Embedded Epitopes from A List Epitopes . |
---|---|---|---|---|---|---|---|---|---|---|
NDIQKLVGKLNWASQIY | A*30:02 | 9 | 3 | 58 | 561 | 7.1 | 29.0 | 0.000 | 10.62 | KLNWASQIY |
TKELQKQIIKIQNFRVYY | A*30:02 | 18 | 8 | 49 | 556 | 6.5 | 25.5 | 0.000 | 10.62 | KIQNFRVYY |
SKLNWASQIYPGIKVRQL | A*30:02 | 6 | 28 | 61 | 536 | 1.7 | 1.9 | 0.160 | 10.62 | KLNWASQIY |
IKIQNFRVYYRDSRDPIW | A*30:02 | 24 | 16 | 43 | 548 | 5.7 | 19.1 | 0.000 | 10.62 | KIQNFRVYY |
TGTEELRSLYNTVATLY | A*30:02 | 30 | 66 | 37 | 498 | 2.9 | 6.1 | 0.000 | 10.62 | RSLYNTVATLY |
GIWQLDCTHLEGKIILVA | B*15:10 | 22 | 5 | 77 | 527 | 5.2 | 30.1 | 0.000 | 15.69 | THLEGKIIL |
GGHQAAMQMLKDTINEEA | B*15:10 | 11 | 17 | 88 | 515 | 2.5 | 3.8 | 0.002 | 15.69 | GHQAAMQML |
LQTGERDWHLGHGVSIEW | B*15:10 | 51 | 15 | 48 | 517 | 4.9 | 36.6 | 0.000 | 15.69 | WHLGHGVSI |
NTMLNTVGGHQAAMQMLK | B*15:10 | 7 | 7 | 92 | 525 | 3.2 | 5.7 | 0.003 | 15.69 | GHQAAMQML |
QMVHQAISPRTLNAWVKV | B*15:10 | 20 | 14 | 79 | 518 | 3.7 | 9.4 | 0.000 | 15.69 | HQAISPRTL |
QGYFPDWQNYTPGPGVRY | A*29 | 9 | 45 | 95 | 482 | 1.0 | 1.0 | 1.000 | 16.48 | YFPDWQNYT |
QITLWQRPLVSIKVGGQI | A*68:02 | 1 | 12 | 109 | 509 | 0.4 | 0.4 | 0.709 | 17.43 | ITLWQRPLV |
QLEKEPIAGAETFYVDGA | A*68:02 | 25 | 17 | 85 | 504 | 3.4 | 8.7 | 0.000 | 17.43 | GAETFYVDGA |
GLGQYIYETYGDTWTGV | A*68:02 | 40 | 13 | 70 | 508 | 4.3 | 22.3 | 0.000 | 17.43 | ETYGDTWTGV |
ETYGDTWTGVEALIRIL | A*68:02 | 35 | 11 | 75 | 510 | 4.4 | 21.6 | 0.000 | 17.43 | ETYGDTWTGV |
GAETFYVDGAANRETKI | A*68:02 | 65 | 61 | 45 | 460 | 3.0 | 10.9 | 0.000 | 17.43 | GAETFYVDGA |
GIQQEFGIPYNPQSQGVV | B*15:03 | 7 | 12 | 106 | 506 | 2.1 | 2.8 | 0.060 | 17.91 | IQQEFGIPY |
YHCLVCFQTKGLGISYGRa | B*15:03 | 11 | 7 | 102 | 511 | 3.4 | 7.9 | 0.000 | 17.91 | FQTKGLGISY |
VKAACWWAGIQQEFGIPYa | B*15:03 | 38 | 11 | 75 | 507 | 4.3 | 23.4 | 0.000 | 17.91 | IQQEFGIPY |
PRTLNAWVKVIEEKAFa | B*15:03 | 58 | 15 | 55 | 503 | 4.4 | 35.4 | 0.000 | 17.91 | VKVIEEKAF |
AVFIHNFKRKGGIGGYSAa | B*15:03 | 80 | 12 | 33 | 506 | 4.9 | 102.2 | 0.000 | 17.91 | FKRKGGIGGY |
YVDRFFKTLRAEQATQDV | B*15:03 | 18 | 132 | 95 | 386 | 0.7 | 0.6 | 0.038 | 17.91 | YVDRFFKTL |
GPKEPFRDYVDRFFKTLR | B*15:03 | 24 | 105 | 89 | 413 | 1.0 | 1.1 | 0.798 | 17.91 | YVDRFFKTL |
GKKAIGTVLVGPTPVNII | B*15:03 | 22 | 19 | 91 | 499 | 3.0 | 6.3 | 0.000 | 17.91 | GKKAIGTVL |
EVNIVTDSQYALGII | B*15:03 | 1 | 21 | 112 | 497 | 0.3 | 0.2 | 0.152 | 17.91 | VTDSQYALGI |
WVKVIEEKAFSPEVIPMFa | B*15:03 | 2 | 39 | 111 | 479 | 0.3 | 0.2 | 0.020 | 17.91 | VKVIEEKAF |
IYPGIKVRQLCKLLRGAK | B*42:01 | 21 | 12 | 99 | 499 | 3.3 | 8.8 | 0.000 | 19.02 | YPGIKVRQL |
NYTPGPGVRYPLTFGWCFa | B*42:01 | 64 | 136 | 56 | 375 | 1.7 | 3.2 | 0.000 | 19.02 | TPGPGVRYPL |
SKLNWASQIYPGIKVRQL | B*42:01 | 16 | 18 | 104 | 493 | 2.5 | 4.2 | 0.000 | 19.02 | YPGIKVRQL |
EVGFPVRPQVPLRPMTFK | B*42:01 | 67 | 150 | 53 | 361 | 1.6 | 3.0 | 0.000 | 19.02 | RPQVPLRPM |
MASEFNLPPIVAKEIVAa | B*42:01 | 47 | 23 | 73 | 488 | 3.5 | 13.7 | 0.000 | 19.02 | LPPIVAKEI |
GATPQDLNTMLNTVGGH | B*42:01 | 90 | 90 | 30 | 421 | 2.6 | 14.0 | 0.000 | 19.02 | TPQDLNTML |
GIKQLQTRVLAIERYLK | B*58:02 | 36 | 3 | 96 | 496 | 4.4 | 62.0 | 0.000 | 20.92 | QTRVLAIERYL |
Peptide Sequence . | Allele . | A+R+ . | A−R+ . | A+R− . | A−R− . | RF . | OR . | p . | Allele Frequency (%) . | Embedded Epitopes from A List Epitopes . |
---|---|---|---|---|---|---|---|---|---|---|
NDIQKLVGKLNWASQIY | A*30:02 | 9 | 3 | 58 | 561 | 7.1 | 29.0 | 0.000 | 10.62 | KLNWASQIY |
TKELQKQIIKIQNFRVYY | A*30:02 | 18 | 8 | 49 | 556 | 6.5 | 25.5 | 0.000 | 10.62 | KIQNFRVYY |
SKLNWASQIYPGIKVRQL | A*30:02 | 6 | 28 | 61 | 536 | 1.7 | 1.9 | 0.160 | 10.62 | KLNWASQIY |
IKIQNFRVYYRDSRDPIW | A*30:02 | 24 | 16 | 43 | 548 | 5.7 | 19.1 | 0.000 | 10.62 | KIQNFRVYY |
TGTEELRSLYNTVATLY | A*30:02 | 30 | 66 | 37 | 498 | 2.9 | 6.1 | 0.000 | 10.62 | RSLYNTVATLY |
GIWQLDCTHLEGKIILVA | B*15:10 | 22 | 5 | 77 | 527 | 5.2 | 30.1 | 0.000 | 15.69 | THLEGKIIL |
GGHQAAMQMLKDTINEEA | B*15:10 | 11 | 17 | 88 | 515 | 2.5 | 3.8 | 0.002 | 15.69 | GHQAAMQML |
LQTGERDWHLGHGVSIEW | B*15:10 | 51 | 15 | 48 | 517 | 4.9 | 36.6 | 0.000 | 15.69 | WHLGHGVSI |
NTMLNTVGGHQAAMQMLK | B*15:10 | 7 | 7 | 92 | 525 | 3.2 | 5.7 | 0.003 | 15.69 | GHQAAMQML |
QMVHQAISPRTLNAWVKV | B*15:10 | 20 | 14 | 79 | 518 | 3.7 | 9.4 | 0.000 | 15.69 | HQAISPRTL |
QGYFPDWQNYTPGPGVRY | A*29 | 9 | 45 | 95 | 482 | 1.0 | 1.0 | 1.000 | 16.48 | YFPDWQNYT |
QITLWQRPLVSIKVGGQI | A*68:02 | 1 | 12 | 109 | 509 | 0.4 | 0.4 | 0.709 | 17.43 | ITLWQRPLV |
QLEKEPIAGAETFYVDGA | A*68:02 | 25 | 17 | 85 | 504 | 3.4 | 8.7 | 0.000 | 17.43 | GAETFYVDGA |
GLGQYIYETYGDTWTGV | A*68:02 | 40 | 13 | 70 | 508 | 4.3 | 22.3 | 0.000 | 17.43 | ETYGDTWTGV |
ETYGDTWTGVEALIRIL | A*68:02 | 35 | 11 | 75 | 510 | 4.4 | 21.6 | 0.000 | 17.43 | ETYGDTWTGV |
GAETFYVDGAANRETKI | A*68:02 | 65 | 61 | 45 | 460 | 3.0 | 10.9 | 0.000 | 17.43 | GAETFYVDGA |
GIQQEFGIPYNPQSQGVV | B*15:03 | 7 | 12 | 106 | 506 | 2.1 | 2.8 | 0.060 | 17.91 | IQQEFGIPY |
YHCLVCFQTKGLGISYGRa | B*15:03 | 11 | 7 | 102 | 511 | 3.4 | 7.9 | 0.000 | 17.91 | FQTKGLGISY |
VKAACWWAGIQQEFGIPYa | B*15:03 | 38 | 11 | 75 | 507 | 4.3 | 23.4 | 0.000 | 17.91 | IQQEFGIPY |
PRTLNAWVKVIEEKAFa | B*15:03 | 58 | 15 | 55 | 503 | 4.4 | 35.4 | 0.000 | 17.91 | VKVIEEKAF |
AVFIHNFKRKGGIGGYSAa | B*15:03 | 80 | 12 | 33 | 506 | 4.9 | 102.2 | 0.000 | 17.91 | FKRKGGIGGY |
YVDRFFKTLRAEQATQDV | B*15:03 | 18 | 132 | 95 | 386 | 0.7 | 0.6 | 0.038 | 17.91 | YVDRFFKTL |
GPKEPFRDYVDRFFKTLR | B*15:03 | 24 | 105 | 89 | 413 | 1.0 | 1.1 | 0.798 | 17.91 | YVDRFFKTL |
GKKAIGTVLVGPTPVNII | B*15:03 | 22 | 19 | 91 | 499 | 3.0 | 6.3 | 0.000 | 17.91 | GKKAIGTVL |
EVNIVTDSQYALGII | B*15:03 | 1 | 21 | 112 | 497 | 0.3 | 0.2 | 0.152 | 17.91 | VTDSQYALGI |
WVKVIEEKAFSPEVIPMFa | B*15:03 | 2 | 39 | 111 | 479 | 0.3 | 0.2 | 0.020 | 17.91 | VKVIEEKAF |
IYPGIKVRQLCKLLRGAK | B*42:01 | 21 | 12 | 99 | 499 | 3.3 | 8.8 | 0.000 | 19.02 | YPGIKVRQL |
NYTPGPGVRYPLTFGWCFa | B*42:01 | 64 | 136 | 56 | 375 | 1.7 | 3.2 | 0.000 | 19.02 | TPGPGVRYPL |
SKLNWASQIYPGIKVRQL | B*42:01 | 16 | 18 | 104 | 493 | 2.5 | 4.2 | 0.000 | 19.02 | YPGIKVRQL |
EVGFPVRPQVPLRPMTFK | B*42:01 | 67 | 150 | 53 | 361 | 1.6 | 3.0 | 0.000 | 19.02 | RPQVPLRPM |
MASEFNLPPIVAKEIVAa | B*42:01 | 47 | 23 | 73 | 488 | 3.5 | 13.7 | 0.000 | 19.02 | LPPIVAKEI |
GATPQDLNTMLNTVGGH | B*42:01 | 90 | 90 | 30 | 421 | 2.6 | 14.0 | 0.000 | 19.02 | TPQDLNTML |
GIKQLQTRVLAIERYLK | B*58:02 | 36 | 3 | 96 | 496 | 4.4 | 62.0 | 0.000 | 20.92 | QTRVLAIERYL |
The analysis of the data resulted in 33 peptide–allele combinations where the allele was relatively frequent among the patients (present in ≥10% of patients) and expressed in at least one positive donor (A+R+ ≥ 1). Of these 33 combinations, 27 peptide–allele restrictions were significant hits (p ≤ 0.05).
Epitope allele restrictions confirmed by Larsen et al. (29).
We also analyzed the peptide–allele restrictions confirmed by Erup Larsen et al. (29). Out of the 18 peptides and respective restricting alleles, 7 peptide–allele combinations were present in RATE results for Los Alamos HIV database dataset with the allele being expressed in at least one positive donor. All of these peptide–allele restrictions were significant hits as per the RATE results (p ≤ 0.05; Table VIII).
Discussion
In this article, we describe a novel approach to determine restriction of CD4+ T cell epitopes at the population level using genetic association methods. The main focus of our efforts was to design and validate a method to facilitate determination of restrictions for HLA class II molecules, and accordingly we mostly used data obtained in our laboratory where HLA class II responses were measured in a number of different settings, using 15-mer peptides and ELISPOT or intracellular cytokine staining assays, in a sufficient number of donors. We anticipate that this methodology will greatly facilitate the rapid generation of tetrameric staining reagents for investigations of immune reactivity in human populations.
Identification of the specific HLA locus and allele responsible for presenting an epitope for recognition by specific TCRs (HLA restriction) is necessary to fully characterize the immune response to an Ag. In this context, it is helpful to distinguish determination of the HLA molecule restricting responses to a given epitope in a particular donor (i.e., individual restriction of that peptide in that donor) from the HLA molecule(s) restricting the response in a population (i.e., general restriction of that peptide in that population).
Individual and general restrictions are not necessarily, and in reality are not frequently, the same. This is because not all donors expressing a given allele will generate T cells recognizing an epitope presented by that allele, because of immunodominance at the epitope and HLA levels (15, 18, 30, 31). Thus, population restriction does not always predict individual restriction. Conversely, because the same epitope can be presented by multiple alleles (epitope promiscuity) (30, 32–36), the fact that a given epitope is presented in a particular individual by a specific allele does not fully predict its general pattern of restriction in a population.
Experimental determination of HLA class II restriction is complex and technically challenging. Because HLA molecules are polygenic (genes are encoded by multiple loci) and polymorphic (genes are encoded by different allelic variants), this task is complicated due to the extreme HLA class II diversity present in human populations (5). To determine which gene and allelic variant act as restriction element, we used classic approaches such as inhibition by HLA locus–specific Abs and/or the use of matched/mismatched or single HLA-transfected cell lines.
Inhibition with anti-HLA class II Abs is a good way to determine the locus (but not the allele). A primary limitation is that there are no available Abs that can distinguish the DRB1 and DRB3/4/5 loci. In addition, questions remain in the field as to whether pan-reactive Abs for the DQ locus truly detect all DQ α and β combinations. Finally, anti-HLA Abs that inhibit T cell recognition may act by simply killing or inhibiting the APC, and hence need to be titrated to find a “sweet spot.” The use of HLA class II–transfected cell lines as described by McKinney et al. (6) is technically the most accurate, but using transfected cell lines is cumbersome and transfected cell lines are available for only the most frequent HLA alleles.
HLA binding and motif predictions could be used, as they are widely available. However, HLA binding in itself is only a necessary, but not sufficient, requisite for T cell recognition. Because of the considerations listed earlier, it is of interest to develop alternative strategies. There have been similar works in this direction, mostly focusing on class I alleles. For example, Kiepiela et al. (37) has applied a statistical approach similar to ours to identify the HLA class I restriction of HIV peptides. Another statistical method designed by Listgarten et al. (38) can identify restricting HLA alleles for a specific epitope from ELISPOT data from a set of patients and their respective allele type. The HLA restrictor tool developed by Erup Larsen et al. (29) is based on the class I binding prediction method NetMHCpan and can predict the patient-specific epitopes restricted by alleles based on the patient HLA allele type. However, these methods are focused on class I alleles and are applicable to only one peptide or one patient data at a time, whereas the RATE tool that we describe in this study was developed to address class II restrictions and relies on datasets describing response of multiple donors, based on genetic inference. Accordingly, the output of our method is based on ORs, RF, and p value from Fisher’s exact test for each peptide–allele combination to assess the strength of association between the peptide and allele.
The RATE tool described in this article represents a novel approach for determining HLA class II allele restriction of epitopes. In particular, it does not depend on experimental work and is most suited to analyze and extract immunological information from complex datasets encompassing large numbers of peptides and donors (as long as HLA typing data for each donor are available) as generated in clinical studies and vaccine trials. As such, the method is also likely to be of value in system biology studies where large amounts of data are generated. Furthermore, because the program is agnostic to the MHC nomenclature used, it can be expanded to other species and we have, in fact, used it to analyze class II immunogenicity data obtained in the rhesus macaque system (B.R. Mothé, C.S. Lindestam Arlehamn, C. Dow, M.B.C. Dillon, R.W. Wiseman, P. Bohn, J. Karl, N.A. Golden, T. Gilpin, T.W. Foreman, M.A. Rodgers, S. Mehra, T.J. Scriba, J.L. Flynn, D. Kaushal, D.H. O'Connor, and A. Sette, submitted for publication).
Although the experimental examples and data analyses provided in this article are focused on HLA class II, we speculate that the program can also be applied to class I or any set of responses associated with large polymorphisms. The limited validation we performed to date using the data from the Los Alamos HIV Molecular Immunology database suggest that this is likely to be the case.
However, certain caveats should be kept in mind when interpreting results derived from the approach. Despite the various options provided in this article, it is likely that there are instances of ambiguous results, especially for peptides weakly or infrequently recognized. This is most commonly observed when too few subjects have been tested, or in the case of alleles that are either rare or very frequent. As a rule of thumb, strong associations can be detected with as few as 10–15 subjects, but ∼30 seems to be required in most cases, with the power of the analysis increasing dramatically as more subjects are included. However, the additional calculation of RF as described in this article increases the likelihood of detection for strong association even with the use of a limited number of subjects. These instances are usually relatively few, and the ambiguity can be resolved with additional testing using transfected cell lines (39) or direct test of tetrameric staining reagents.
Second, it is possible that HLA molecules encoded at different loci might be associated with statistically significant OR values for the same epitope. Although in some cases this may indeed be because of the promiscuity of the epitope, in others it may reflect the fact that the different HLA loci are physically close to one another in their chromosomal location, and thus are in strong linkage disequilibrium with each other (40, 41). For this reason, if alleles for more than one HLA locus are associated with significant OR values for a specific epitope, further analysis is warranted. We recommend that the locus with the best p value be considered first. If the combination of the data from this locus with any other locus does not lead to a better p value for the combined data, the association is likely due to linkage disequilibrium and should be discarded. In addition, instances where an allele encoded by a particular locus is not predicted to bind the epitope under consideration likely reflect an association due to linkage disequilibrium and should be considered with caution or discarded.
Third, although one of the advantages of RATE is to be able to globally analyze a dataset generated over multiple experiments (because it is in our experience impossible to determine restrictions in a single experiment by cellular methods for many donors and many peptides), the issue of reproducibility from experiment to experiment needs to be carefully considered. If significant experiment-to-experiment variability is present in a given dataset, this would correspondingly affect the conclusions. Therefore, the application of appropriate positive and negative controls within each experiment is necessary. In our experience, we always include a negative control and a positive PHA control, and ensure that each falls within acceptable ranges, based on our routine quality control of experimental assays.
Finally, we acknowledge that the experimental validation of the RATE approach is still somewhat limited. In total, RATE correctly predicted 10 novel restrictions (8 M. tuberculosis and 2 B. pertussis) and 5 previously validated restrictions (3 M. tuberculosis and 2 Timothy Grass). Evaluation of additional epitopes mapped by other investigators is limited by the fact that immunogenicity data need to be described on a donor-by-donor basis, and HLA typing must be available for each donor. Still, clearly additional experimental work will more firmly establish the success rate of the approach.
In conclusion, we have developed an automated method to infer HLA restriction from large datasets of T cell responses in HLA-typed subjects. The Web-accessible program calculates OR and relative frequencies from simple data tables, incorporates prediction of HLA binding capacity, and accounts for linkage disequilibrium and promiscuous recognition by iterative calculation of OR values for different combinations of HLA molecules. We consider the current algorithm and software implementation a proof of principle that it is possible to derive HLA restrictions based on genetic associations. To the best of our knowledge, the program presented in this article is the first that allows determination of restriction at the population level, and estimates response rate and immunodominance, as well as promiscuous restrictions. Accordingly, we believe it is important that the current prototype is made available to the scientific community. The tool is indeed available online at http://iedb-rate.liai.org, and we look forward to receiving user feedback for its improvement and optimization. We expect that future refinements of the approach will lead to improved results, for example, by more precisely modeling the statistical underpinnings of HLA linkage and promiscuous binding, and incorporating the predicted binding affinities as statistical priors rather than binary cutoffs.
Footnotes
This work was supported by National Institutes of Health Contracts HHSN272201200010C, HHSN272200900042C, HHSN272201400045C, and HHSN272200900044C and Bill and Melinda Gates Foundation Grant OPP1066265.
The online version of this article contains supplemental material.
Abbreviations used in this article:
- IEDB
Immune Epitope Database
- LTBI
latent Mycobacterium tuberculosis infection
- OR
odds ratio
- RATE
Restrictor Analysis Tool for Epitopes
- RF
relative frequency
- SFC
spot-forming cell.
References
Disclosures
The authors have no financial conflicts of interest.