Abstract
Accurate prioritization of immunogenic neoantigens is key to developing personalized cancer vaccines and distinguishing those patients likely to respond to immune checkpoint inhibition. However, there is no consensus regarding which characteristics best predict neoantigen immunogenicity, and no model to date has both high sensitivity and specificity and a significant association with survival in response to immunotherapy. We address these challenges in the prioritization of immunogenic neoantigens by (1) identifying which neoantigen characteristics best predict immunogenicity; (2) integrating these characteristics into an immunogenicity score, the NeoScore; and (3) demonstrating a significant association of the NeoScore with survival in response to immune checkpoint inhibition. One thousand random and evenly split combinations of immunogenic and nonimmunogenic neoantigens from a validated dataset were analyzed using a regularized regression model for characteristic selection. The selected characteristics, the dissociation constant and binding stability of the neoantigen:MHC class I complex and expression of the mutated gene in the tumor, were integrated into the NeoScore. A web application is provided for calculation of the NeoScore. The NeoScore results in improved, or equivalent, performance in four test datasets as measured by sensitivity, specificity, and area under the receiver operator characteristics curve compared with previous models. Among cutaneous melanoma patients treated with immune checkpoint inhibition, a high maximum NeoScore was associated with improved survival. Overall, the NeoScore has the potential to improve neoantigen prioritization for the development of personalized vaccines and contribute to the determination of which patients are likely to respond to immunotherapy.
This article is featured in Top Reads, p.
Introduction
Cancers arise through mutations in the genome of healthy human cells. As these mutations occur, some will produce mutated proteins, which have the potential to be processed into neoantigens that bind MHC class I and are presented on the cell surface. These neoantigens then act as tumor-specific targets with the potential to elicit a cytotoxic CD8+ T cell response (1–4). Tumor-specific neoantigens have strong potential to be targets of T cell–mediated destruction because they are not subject to immune tolerance or nonreactivity to self. However, there are two key ways in which the earlier mechanism may fail in tumor destruction. For one, recent evidence suggests that most neoantigens do not elicit an immune response in their natural state, termed immunological ignorance (5). Second, once a T cell response is mounted to a neoantigen, that response may become exhausted over time because of inhibitory signals from the tumor microenvironment (6).
Circumventing these limitations to reinvigorate the host immune response has been the goal of many recent cancer therapies. Personalized cancer vaccines have been created and have demonstrated early success to address immunological ignorance (7–12). These vaccines have taken several forms, including direct exposure to neoantigens (11), neoantigen-encoding RNA vaccines (12), neoantigen-loaded dendritic cell vaccines (7), and adoptive transfer of neoantigen-specific T cells (2, 13). Each of these methods requires accurate knowledge of the neoantigens presented by the tumor cell with the potential to elicit an immune response. In silico prioritization methods have been used to prioritize which set of neoantigens should be experimentally tested, but the ability to prioritize the immunogenicity of each neoantigen, with high sensitivity and specificity, is still limited (14–18).
Exhaustion of T cells and attenuation of T cell activation can be overcome by immune checkpoint inhibition, such as mAbs against PD-1 or CTLA-4. Immune checkpoint inhibition blocks inhibitory signals to the T cells to enhance T cell–mediated tumor destruction (19–21). However, immune checkpoint inhibition is effective in only a subset of patients (6), and there is no consensus on how to prioritize which patients will respond (18, 21–26). In a pan-cancer analysis, Yarchoan et al. (27) demonstrated that cancer types with a higher mutational burden, such as melanoma, had improved response to anti–PD-1 therapy compared with cancer types with a lower mutational burden. There are limitations to mutational burden as a predictor of response to immune checkpoint inhibition. First, in multiple myeloma, there was an association of increased tumor mutational burden with decreased response to immune checkpoint inhibition (28). Second, in melanoma, the association between the tumor mutational burden and response to immune checkpoint inhibition was confounded by the melanoma subtype (29). Finally, in lung cancers that progressed after treatment with immune checkpoint inhibitors, there was an increase in the tumor mutational burden compared with the pretreatment state (30), thus contradicting the expectation that tumor cells resistant to immune checkpoint inhibition would have a low number of neoantigens. Together, these findings suggest that tumor mutational burden is not sufficient for predicting response to immune checkpoint inhibition.
Several recent papers have been dedicated to predicting neoantigen immunogenicity based on the characteristics of validated immunogenic neoantigens (16–18, 23, 31). However, there is no consensus regarding which neoantigen characteristics are important for the prioritization of immunogenic neoantigens or the best way to combine the characteristics into an overall immunogenicity score. To identify the characteristics that best prioritize the immunogenicity of the neoantigens, we applied a model-based approach to evaluate neoantigen characteristics encompassing expression, processing, presentation, and T cell recognition in prioritizing neoantigen immunogenicity. The selected characteristics were then combined into an overall logistic regression model called the “NeoScore.” The development of the NeoScore has largely focused on melanoma, except for one lung cancer patient included in the Tumor Neoantigen Selection Alliance (TESLA) consortium dataset (16).
Immune checkpoint inhibition and personalized neoantigen vaccines are particularly effective in mutation-rich melanoma (25, 32–34). However, even in melanoma, a positive outcome from these interventions is not assured (6). To assess the clinical utility of the NeoScore and its ability to improve assessment of outcome in melanoma, the relationship of the NeoScore to survival in response to immunotherapy was tested using the datasets of Van Allen et al. (21) and Liu et al. (29).
Materials and Methods
Datasets
For training of the neoantigen prioritization model, whole-exome sequencing (WES) and RNA sequencing (RNAseq) data were obtained from the TESLA consortium database on Synapse (16). The TESLA consortium data came from four patients with melanoma and a single lung cancer patient. The TESLA consortium provided RNAseq data, WES data, and clinically determined HLA types for each of these patients to 28 teams and used the neoantigen rankings from 25 of the teams to prioritize which neoantigens were experimentally validated. The neoantigens validated for their ability to elicit a T cell response were selected by two guidelines: (1) the top 5 neoantigens ranked by each team were tested, and (2) the neoantigens that came up most frequently in the top 50 ranked neoantigens for each team were selected. The neoantigens tested were also constrained by HLA restriction requirements for the validation experiments. The final available dataset includes five patients with a total of 347 neoantigens that had been tested for their ability to elicit a T cell response using multimer staining. The TESLA consortium found that 26 of the 347 tested neoantigens elicited a T cell response in an unvaccinated state (16). Neoantigens were used for model creation in this manuscript if they were (1) tested for immunogenicity by the TESLA consortium, (2) identified by either GATK Mutect2 or Strelka, and (3) had expression data. Accession information for the TESLA consortium dataset is included in Table I.
Summary of datasets used, materials used, and accession information
Dataset . | Materials Used . | Accession Information . |
---|---|---|
TESLA consortium (16) | Raw RNAseq and WES data | https://www.synapse.org/#!Synapse:syn21048999/wiki/603788 |
Lists of validated neoantigens and the T cell response | Supplemental Table IV | |
Cohen (35) | Raw RNAseq data | https://trace.ncbi.nlm.nih.gov/Traces/sra/?study=SRP062169 |
Lists of validated neoantigens and the T cell response | Supplemental Table II | |
Strønen (36) | Lists of validated neoantigens and the T cell response | Supplemental Table VIII |
Carreno (7) | Lists of validated neoantigens and the T cell response | Supplemental Tables I–III |
Ott (11) | Lists of validated neoantigens and the T cell response | Supplemental Table IV |
Van Allen (21) | Raw RNAseq and WES data | dbGaP accession number (phs000452.v3.p1) https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000452.v3.p1 |
Liu (29) | Raw RNAseq and WES data | |
Rizvi (25) | Raw WES data | dbGaP accession number (phs000980.v1.p1) https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000980.v1.p1 |
Dataset . | Materials Used . | Accession Information . |
---|---|---|
TESLA consortium (16) | Raw RNAseq and WES data | https://www.synapse.org/#!Synapse:syn21048999/wiki/603788 |
Lists of validated neoantigens and the T cell response | Supplemental Table IV | |
Cohen (35) | Raw RNAseq data | https://trace.ncbi.nlm.nih.gov/Traces/sra/?study=SRP062169 |
Lists of validated neoantigens and the T cell response | Supplemental Table II | |
Strønen (36) | Lists of validated neoantigens and the T cell response | Supplemental Table VIII |
Carreno (7) | Lists of validated neoantigens and the T cell response | Supplemental Tables I–III |
Ott (11) | Lists of validated neoantigens and the T cell response | Supplemental Table IV |
Van Allen (21) | Raw RNAseq and WES data | dbGaP accession number (phs000452.v3.p1) https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000452.v3.p1 |
Liu (29) | Raw RNAseq and WES data | |
Rizvi (25) | Raw WES data | dbGaP accession number (phs000980.v1.p1) https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000980.v1.p1 |
For testing the neoantigen prioritization model, lists of tested neoantigens from melanoma tumors were obtained (7, 11, 35, 36). These datasets contain 357 tested neoantigens (n = 7 immunogenic neoantigens) (35), 149 tested neoantigens (n = 18 immunogenic neoantigens) (11), 57 tested neoantigens (n = 11 immunogenic neoantigens) (36) and 21 tested neoantigens (n = 9 immunogenic neoantigens) (7). The neoantigens were tested for immunogenicity with tetramer staining and cytokine secretion (35), ELISPOT (11), and multimer staining (7, 36). Strønen et al. (36), Ott et al. (11), and Carreno et al. (7) also provided the expression data quantified as either fragments per kilobase of transcript per million mapped reads, reads per kilobase of transcript per million mapped reads, or transcripts per million (TPM) (7, 11, 35, 36). For the Cohen et al. dataset (35), no expression data were provided. The RNAseq data from the Cohen dataset were obtained and analyzed as described later for read count quantification. Accession information is provided in Table I.
Finally, for survival analysis, WES and RNAseq data were obtained from the Van Allen et al. (21), Liu et al. (29), and Rizvi et al. (25) cohorts (all accession information is available in Table I). The inclusion criteria for the Van Allen dataset were as follows: (1) both WES and RNAseq data available, and (2) cutaneous melanoma as the primary lesion. All patients in the Van Allen cohort (21) underwent treatment with an anti–CTLA-4 mAb (ipilimumab). Inclusion criteria for the Liu et al. (29) dataset were as follows: (1) both WES and RNAseq data available, (2) cutaneous melanoma as the primary lesion, and (3) no prior treatment with an anti–CTLA-4 mAb (37). All patients in the Liu cohort (29) underwent treatment with an anti–PD-1 mAb (nivolumab or pembrolizumab). All samples from the Rizvi et al. (25) dataset were included in the analysis, including adenocarcinoma, squamous cell carcinoma, and non–small cell lung cancer. All patients in the Rizvi cohort underwent treatment with an anti–PD-1 mAb (pembrolizumab only).
Data preparation
WES and RNAseq FASTQ files from the TESLA consortium, Liu et al. (29) and Van Allen et al. (21) RNAseq FASTQ files from Cohen et al. (35), and WES FASTQ files from Rizvi et al. (25) were visualized for quality using FastQC (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/). FASTQ files were trimmed for quality using Trimmomatic (38) IlluminaClip with the following parameters: seed_mismatches = 2, palindrome_clip_threshold = 30, simple_clip_threshold = 10, leading = 10, trailing = 10, winsize = 4, and winqual = 15. Quality was then revisualized after trimming. Trimmed WES reads were mapped to the GRCh38.p7 reference genome, from 1000 genomes (39), and read group labels were added using BWA-mem (H. Li, manuscript posted on arXiv, DOI: 10.6084/M9.FIGSHARE.963153.V1). SAM files were converted to BAM and coordinate sorted (40). The BAM files were then converted to pileup format using SAMtools v.1.4 (40).
Identification of somatic mutations and neoantigen generation
Single-nucleotide variants (SNVs) and small insertions/deletions (indels) were identified using four programs, along with their recommended filters: GATK Mutect2 version 4.1.7.0 with default parameters, Varscan2 version 2.3.9 with minimum coverage of 10, minimum variant allele frequency (VAF) of 0.08, and somatic p value of 0.05, Strelka version 2.9.2 with default parameters, and LoFreq version 2.1.1 with default parameters (Refs. 41–43 and D. Benjamin, T. Sato, K. Cibulskis, G. Getz, C. Stewart, and L. Lichtenstein, manuscript posted on bioRxiv, DOI: 10.1101/861054). Results for GATK Mutect2 were filtered with the recommended FilterMutectCalls, and Varscan2 results were filtered using the Perl false-positive filter (https://github.com/ckandoth/variant-filter). Results from all four programs were examined with and without their respective filters. LoFreq results were not filtered to allow maximal potential to identify the missing mutations. Matched normal samples were used as the reference for each sample. The overlap of GATK Mutect2 and Strelka was used for the identification of SNVs and indels in the Liu et al. (29), Van Allen et al. (21), and Rizvi et al. (25) datasets for assessing the association of the NeoScore model with survival in response to immune checkpoint inhibition in melanoma.
Somatic mutations were separated from those SNVs that fell within 1 bp of an indel position, because these are likely to be false positives because of alignment errors. The mutations were annotated using the Variant Effect Predictor tool from Ensembl version 90.9 (44). Then, 21mer amino acid sequences were generated for each mutation using pVAC-Seq tools version 3.0.5 (45). Finally, the 21mers were split into all possible 9mers and 10mers, where the mutation of interest was in every location. The full pipeline from read mapping through to the identification of somatic mutations is available at https://github.com/SexChrLab/Cancer_Genomics.
Because the clinical samples for Liu et al. (29) and Van Allen et al. (21) were composed of formalin-fixed, paraffin-embedded (FFPE) samples, we considered applying an FFPE filter to remove false-positive mutations introduced by FFPE storage. However, the characteristic G→A and C→T mutations introduced by FFPE processing overlap with the mutational signature introduced by UV radiation. Removal of the FFPE signature also removed the characteristic bias toward G→A and C→T mutations in the samples; therefore, an FFPE filter was not applied.
Calculation of neoantigen characteristics
For each of the validated neoantigens, neoantigen characteristics with potential significance in predicting expression, processing, MHC binding, and TCR recognition probability were calculated as described in this article. The full pipeline for calculation and processing of the neoantigen characteristics can be found at https://github.com/ElizabethBorden/Process_peptide_lists. A log10 transformation was applied if the distribution of the characteristic had a large degree of skewness. The difference in the values for each of the neoantigen characteristics between immunogenic and nonimmunogenic neoantigens was assessed using a two-sample, two-sided t test. Correlation coefficients were calculated using Spearman correlation coefficients. The p values <0.05 were considered statistically significant.
Expression
For the datasets from Cohen et al. (35), the TESLA consortium, and Liu et al. (29), transcriptome assembly and read count quantifications were completed with Salmon version 0.11.3, using the Ensembl GRCh38.p7 reference genome (46, 47). mRNA expression in units of TPM was log10 transformed, and a constant of 0.1 was added to all values before the transformation to avoid taking the log of zero. To account for the different units used across each dataset, expression values were centered and normalized by subtracting the mean and then dividing by the SD.
Clonality
VAF
VAF was calculated by GATK Mutect2 (Benjamin et al., manuscript posted on bioRxiv, DOI: 10.1101/861054). Because the VAF was calculated only by GATK Mutect2, but not for Strelka, 14 missing data points were estimated as the average of the rest of the data. No statistically significant difference in the VAF was observed between immunogenic and nonimmunogenic neoantigens, either with or without these data points.
Processing
Cleavage and TAP transport potentials were calculated for each of the available neoantigens using NetCTLpan1.0 (50).
Dissociation constants
Dissociation constants of the neoantigen:MHC class I complex were calculated in nanomolar (nM) units using NetMHCpan4.0 (51). These values were log10 transformed before inclusion in the model.
Binding stability
Hydrophobicity method from the TESLA consortium
The number of hydrophobic residues was divided by the total number of residues in the neoantigen to create a “hydrophobicity fraction” (16). In keeping with the methods of the TESLA consortium, hydrophobic residues in this study were isoleucine, leucine, phenylalanine, methionine, tryptophan, valine, and cysteine.
Hydrophobicity with empirical prevalence
The hydrophobicity fraction was calculated as described for the TESLA consortium using the amino acids found to be empirically prevalent by Chowell et al. (53). Proline, leucine, and methionine were considered to have a high probability and given a score of +2. Glycine, tryptophan, phenylalanine, isoleucine, and valine were found to have a medium probability and given a score of +1. All others were given a score of 0.
Hydrophobicity Łuksza method
A neoantigen was given a score of zero if the mutation at the anchor residue changed from a hydrophobic to a hydrophilic residue (23). All other neoantigens were given a score of 1. Hydrophobic neoantigens in this study were defined as alanine, isoleucine, leucine, methionine, phenylalanine, tryptophan, tyrosine, and valine.
TCR recognition
Prediction of TCR recognition probability, R, was calculated as described previously (23). A BLOSUM62 similarity matrix was used to assess the sequence similarity between a neoantigen and the closest matched known T cell epitope from the Immune Epitope Database (54). The sequence similarity was then used in place of binding energies, and the TCR recognition probability was calculated as:
where is the sequence similarity, a is the horizontal displacement of the binding curve, and k sets the steepness of the curve at a. Based on the model fit by Łuksza et al. (23), the parameters a and k were set to 26 and 4.87, respectively. These parameters were optimized for both melanoma and lung cancer patients, the two cancer populations included in our training and test datasets. Finally, Z(k) is the partition function over the unbound state and all bound states, calculated as follows:
Sequence similarity
The closest matched human peptide was identified using Blast v.2.10.1 (55), and the sequence similarity of the potential neoantigen to the closest matched human peptide was calculated using a BLOSUM62 matrix, as described previously (17). The sequence similarity was normalized across neoantigen length by dividing by the number of amino acids.
Amplitude
Dissociation constants for the neoantigen:MHC class I complex were then adjusted by the dissociation constants for the closest matched normal human peptide:MHC class I complex, a characteristic called amplitude. The ratio was taken of the dissociation constant of the closest matched human peptide:MHC class I () to the dissociation constant of the neoantigen:MHC class I () as follows (23):
Analysis of anchor versus nonanchor residue mutations
Neoantigens were separated by their mutation position (anchor versus nonanchor residues). Then, both the amplitude and the dissociation constant of the neoantigen:MHC class I complex were compared between the immunogenic and nonimmunogenic neoantigens within each group. Comparisons were done using a two-sample, two-sided t test. The p values <0.05 were considered statistically significant.
HLA typing
Elastic net regression modeling
The TESLA consortium data were split into 1000 random subgroups containing all the immunogenic neoantigens (n = 26) and an equal number of randomly selected nonimmunogenic neoantigens. An elastic net regression was performed using the glmnet package in R for each of these splits (57). The selected coefficients and area under the receiver operator characteristics curve (AUC) from each model were tabulated across the 1000 splits. An optimal threshold for the model was selected in the TESLA dataset to optimize both sensitivity and specificity by the Youden Index and was implemented through the optimal cutpointr package from R statistical software (https://github.com/thie1e/cutpointr).
Logistic regression modeling
Logistic regression modeling was performed, and optimism values were calculated with the RMS package in R statistical software (https://cran.r-project.org/web/packages/rms/rms.pdf).
Survival analysis
To avoid scaling expression values on a patient-level basis, we determined coefficients on the TESLA consortium data for log10-transformed, nonscaled TPM expression data. The intercept changed to −1.7951 and the expression coefficient to 1.2868, but there was no change in the coefficients of the dissociation constant or binding stability and no effect on the performance of the model. The optimal thresholds for each of the mutational burden, neoantigen burden, and maximum NeoScore were determined by optimizing based on an adjusted log-rank test implemented in the MaxStat package in R (58). Then, the survival analysis was performed between scores above the selected threshold and below, using Kaplan Meier estimation and the log-rank test statistic. The p values <0.05 were considered statistically significant.
Results
Overlap of Strelka and GATK Mutect2 identifies the maximum number of validated immunogenic mutations
The first step in effective prioritization of neoantigens is to identify a high-fidelity list of somatic mutations. Isolating somatic mutations in cancers is more difficult than variant calling in normal tissue because cancers do not follow the typical rules of copy number or heterozygosity and often consist of multiple clonal populations with normal tissue contamination. The TESLA consortium validated neoantigens derived from mutations identified by 25 teams. The methods used for identifying the somatic mutations were not reported, making it difficult to reproduce all neoantigens (16). To maximize the immunogenic mutations identified, three highly rated programs were used to identify SNVs and small insertions and deletions (indels) for the data from the TESLA consortium: VarScan2, GATK Mutect2, and Strelka (Refs. 41, 42, and Benjamin et al., manuscript posted on bioRxiv, DOI: 10.1101/861054). A large degree of overlap was found in the ability of each program to identify the 34 validated immunogenic mutations. As shown in (Fig. 1, GATK Mutect2 and Strelka successfully identified 27/34 mutations, while VarScan2 identified 24/34. The mutations from each of these programs overlapped, such that 27 was the maximum number of immunogenic mutations identified. To ensure that the success of VarScan2 in identifying the immunogenic mutations was not hindered by over-filtering, the unfiltered output was assessed and only one immunogenic mutation had been eliminated by filtration steps (data not shown). GATK Mutect2 and Strelka identified the same number of immunogenic mutations with or without their respective filters (data not shown). LoFreq was also tested for the identification of somatic mutations, because LoFreq is optimized to call low-frequency mutations (43). However, LoFreq did not identify any additional immunogenic mutations (data not shown). The neoantigens derived from unidentified immunogenic mutations may be because of the reference genome version, peptide generation steps, or mutations identified by programs that were not tested. Overall, the combination of GATK Mutect2 and Strelka identified greater than or equal to the number of validated immunogenic neoantigens as in all other reported pipelines (16) while simultaneously decreasing the total number of potential neoantigens by 89.44%.
Maximum number of validated immunogenic mutations identified by the overlap of Strelka and GATK Mutect2. Comparison of the number of neoantigens derived from SNVs and small insertions and deletions (indels) by each of three different programs (Varscan2, GATK Mutect2, and Strelka) in the TESLA consortium dataset. Red dots represent the validated immunogenic neoantigens from each patient. The small circle represents validated immunogenic neoantigens that were not identified by the three programs.
Maximum number of validated immunogenic mutations identified by the overlap of Strelka and GATK Mutect2. Comparison of the number of neoantigens derived from SNVs and small insertions and deletions (indels) by each of three different programs (Varscan2, GATK Mutect2, and Strelka) in the TESLA consortium dataset. Red dots represent the validated immunogenic neoantigens from each patient. The small circle represents validated immunogenic neoantigens that were not identified by the three programs.
Dissociation constant and binding stability of the neoantigen:MHC class I complex and expression are significantly different between immunogenic and nonimmunogenic neoantigens
A set of computationally predicted neoantigen characteristics were calculated for the expression, processing, presentation, and TCR recognition of SNV and small indel-derived neoantigens (Fig. 2A). Each of these characteristics was included in the development of the NeoScore model. (Fig. 2B demonstrates that the set of characteristics included in model development is inclusive of all of those considered by the three models to which the NeoScore is compared. To begin, the distribution of each of these characteristics was assessed for immunogenic and nonimmunogenic neoantigens from the TESLA consortium dataset (Table I).
Expression, dissociation constant, and stability are significantly different between immunogenic and nonimmunogenic neoantigens. (A) Steps in the formation of an MHC class I–restricted immunogenic neoantigen. (B) Computationally predicted neoantigen characteristics included in development of the NeoScore model in comparison with three other models. Expression consists of the gene-level mRNA expression and clonality or VAF of the mutation. Processing consists of proteasomal cleavage and TAP transport potential. MHC class I binding is assessed through the MHC class I dissociation constant, binding stability, and hydrophobicity (Fig. 3) of the neoantigen. The potential to stimulate a T cell response is assessed by the similarity to known T cell epitopes, the sequence similarity to normal human peptides, and the relative MHC class I dissociation constant compared with the dissociation constant of the closest matched normal peptide. All 10 characteristics listed under the NeoScore model were considered for inclusion, and those in bold (expression, dissociation constant, and binding stability) were the characteristics selected for inclusion in the final model. (C–K) Comparison of the distribution of the characteristics for immunogenic and nonimmunogenic neoantigens. (C) Log10-transformed distribution of mRNA expression. (D) Distribution of VAF calculated by GATK Mutect2. (E and F) Distribution of proteasomal cleavage and TAP transport potential calculated by NetCTLpan. (G) Log10-transformed distribution of the MHC class I to neoantigen dissociation constant calculated by NetMHCpan. (H) Log10-transformed distribution of the MHC class I to neoantigen binding stability calculated by NetMHCstabpan. (I) TCR recognition probability calculated using the model from Łuksza et al. (23). (J) Log10-transformed distribution of the BLOSUM62 sequence similarity between the closest matched human peptide and the neoantigen of interest, divided by the length of the neoantigen to normalize. (K) Log10-transformed distribution of the amplitude calculated as the ratio of the dissociation constant of the closest matched human peptide to MHC class I to the dissociation constant of the neoantigen to MHC class I. **p < 0.001, ***p < 10−5.
Expression, dissociation constant, and stability are significantly different between immunogenic and nonimmunogenic neoantigens. (A) Steps in the formation of an MHC class I–restricted immunogenic neoantigen. (B) Computationally predicted neoantigen characteristics included in development of the NeoScore model in comparison with three other models. Expression consists of the gene-level mRNA expression and clonality or VAF of the mutation. Processing consists of proteasomal cleavage and TAP transport potential. MHC class I binding is assessed through the MHC class I dissociation constant, binding stability, and hydrophobicity (Fig. 3) of the neoantigen. The potential to stimulate a T cell response is assessed by the similarity to known T cell epitopes, the sequence similarity to normal human peptides, and the relative MHC class I dissociation constant compared with the dissociation constant of the closest matched normal peptide. All 10 characteristics listed under the NeoScore model were considered for inclusion, and those in bold (expression, dissociation constant, and binding stability) were the characteristics selected for inclusion in the final model. (C–K) Comparison of the distribution of the characteristics for immunogenic and nonimmunogenic neoantigens. (C) Log10-transformed distribution of mRNA expression. (D) Distribution of VAF calculated by GATK Mutect2. (E and F) Distribution of proteasomal cleavage and TAP transport potential calculated by NetCTLpan. (G) Log10-transformed distribution of the MHC class I to neoantigen dissociation constant calculated by NetMHCpan. (H) Log10-transformed distribution of the MHC class I to neoantigen binding stability calculated by NetMHCstabpan. (I) TCR recognition probability calculated using the model from Łuksza et al. (23). (J) Log10-transformed distribution of the BLOSUM62 sequence similarity between the closest matched human peptide and the neoantigen of interest, divided by the length of the neoantigen to normalize. (K) Log10-transformed distribution of the amplitude calculated as the ratio of the dissociation constant of the closest matched human peptide to MHC class I to the dissociation constant of the neoantigen to MHC class I. **p < 0.001, ***p < 10−5.
Expression was included in the NeoScore model development in two ways: (1) mRNA expression level and (2) VAF. Expression levels from RNAseq data were calculated as the TPM expression of the gene from which the neoantigen is derived. Immunogenic neoantigens had a significantly higher expression level than nonimmunogenic neoantigens (p = 2.875 × 10−4; (Fig. 2C). The clonality of the neoantigen was calculated by FastClone (49), but FastClone did not converge for 4/6 of the samples, so the VAF calculated by GATK-mutect2 was included in the development of the NeoScore model instead. No significant difference in the VAF was observed between immunogenic and nonimmunogenic neoantigens (p = 0.359; (Fig. 2D).
Processing steps for the neoantigen were included in the development of the NeoScore model as both the proteasomal cleavage potential and TAP potential scores. Although there was a higher average for both characteristics in the immunogenic than nonimmunogenic neoantigens, no statistically significant difference was observed (p = 0.817 and p = 0.836, respectively) (Fig. 2E, 2F). The binding to the MHC class I molecule was then considered by both the dissociation constant and stability of the neoantigen:MHC class I interaction. NetMHCpan was selected as the software for predicting the MHC class I dissociation constants because of its enhanced performance compared with other binding prediction methods (51). NetMHCpan achieved >0.90 AUC across six independent test datasets (59). The MHC class I dissociation constants were significantly lower in immunogenic than nonimmunogenic neoantigens, indicating a higher binding affinity of immunogenic neoantigens to MHC class I (p = 2.088 × 10−7; (Fig. 2G). The binding stability showed significantly higher values for immunogenic over nonimmunogenic neoantigens (p = 3.895 × 10−5; (Fig. 2H).
The ability of the neoantigen to stimulate a T cell response was included in the NeoScore model development using the model created by Łuksza et al. (23) to calculate a characteristic referred to as the TCR recognition probability. The TCR recognition probability is a probabilistic model that considers the sequence similarity between the neoantigen and a known T cell epitope from the Immune Epitope Database as a proxy for the binding affinity of the neoantigen:MHC class I–TCR interaction. The TCR recognition probability showed no statistically significant difference between the immunogenic and nonimmunogenic neoantigens (p = 0.636; (Fig. 2I).
Two methods to account for T cell development were also included in NeoScore development. During maturation in the thymus, T cells expressing TCRs with high avidity to normal human peptides undergo apoptosis. Thus, a neoantigen with a high degree of sequence similarity to a normal human peptide is less likely to elicit a T cell response. The first method used to account for T cell development was the sequence similarity to the closest matched normal human peptide. No statistically significant difference was observed in the sequence similarity for immunogenic and nonimmunogenic neoantigens (p = 0.511; (Fig. 2J). The second method considered was the amplitude, where the dissociation constant of the neoantigen:MHC class I complex is adjusted by the dissociation constant of the closest matched human peptide:MHC class I complex. The amplitude adjusts for the regulation of TCR specificities during T cell maturation but also considers that only a normal human peptide capable of binding an MHC class I molecule will significantly impact the immunogenicity of the neoantigen. The amplitude was not significantly different between immunogenic and nonimmunogenic neoantigens (p = 0.209; (Fig. 2K).
One final characteristic considered in the development of the NeoScore model was the hydrophobicity of the neoantigen. The hydrophobicity of the neoantigen has been proposed to be associated with greater neoantigen immunogenicity because of the hydrophobicity of key binding pockets in the MHC class I binding groove (53). Mixed results have been reported for the association of hydrophobicity and immunogenicity to date (16, 18, 23, 53). One reason is the use of different methods for hydrophobicity, three of which are considered in this study. In the first method, the hydrophobicity is calculated as a fraction of the neoantigen residues that are hydrophobic (16). The TESLA consortium reported a significantly lower hydrophobicity of immunogenic neoantigens compared with nonimmunogenic neoantigens, which is in the opposite direction as expected. Although there was not a statistically significant difference in the neoantigens included in our analysis, the hydrophobicity fraction still had a lower average for immunogenic than nonimmunogenic neoantigens (Fig. 3, p = 0.204). No difference in hydrophobicity was seen in additional datasets from Carreno et al. (7) or Strønen et al. (36), and a significantly higher hydrophobicity of immunogenic neoantigens was observed in the dataset from Ott et al. (Fig. 3, p = 0.0168) (11). The second method calculated the hydrophobicity fraction using the empirical observations from Chowell et al. (53) to determine which amino acids would increase the likelihood of immunogenicity (53). The method using the observations from Chowell et al. (53) considers both hydrophobicity and other chemical properties such as side-chain bulkiness and polarity. No differences in hydrophobicity were seen across the four datasets using the empirical observations from Chowell et al. (53) (Fig. 3). The final method considered the hydrophobicity at the anchor residues. A mutation that changed a previously hydrophobic anchor residue to a hydrophilic residue was given a score of zero, while all other changes or no change were given a score of 1 (23). Although there was no statistically significant difference in hydrophobicity for any of the four datasets using the method of Łuksza et al. (23), three of four datasets showed a greater percentage of immunogenic neoantigens without a loss of hydrophobicity at an anchor residue (Fig. 3). Overall, because the method from Łuksza et al. showed the greatest consistency, only this method was included in development of the NeoScore model.
No method for calculating neoantigen hydrophobicity is consistently associated with immunogenicity. Comparison of the distribution of hydrophobicity values for the TESLA consortium, Carreno et al. (7), Strønen et al. (36), and Ott et al. (11) datasets between immunogenic and nonimmunogenic neoantigens. Hydrophobicity was calculated by three methods. The TESLA consortium method is calculated as the number of hydrophobic amino acids divided by the total number of amino acids. The method based on the Chowell et al. (53) empirical data uses the same calculation scheme but considers both hydrophobicity and other chemical properties, such as side-chain bulkiness and polarity in ranking the neoantigens. The Łuksza et al. (23) method considers the change of an amino acid at the anchor residues compared with the closest matched human peptide. *p < 0.05.
No method for calculating neoantigen hydrophobicity is consistently associated with immunogenicity. Comparison of the distribution of hydrophobicity values for the TESLA consortium, Carreno et al. (7), Strønen et al. (36), and Ott et al. (11) datasets between immunogenic and nonimmunogenic neoantigens. Hydrophobicity was calculated by three methods. The TESLA consortium method is calculated as the number of hydrophobic amino acids divided by the total number of amino acids. The method based on the Chowell et al. (53) empirical data uses the same calculation scheme but considers both hydrophobicity and other chemical properties, such as side-chain bulkiness and polarity in ranking the neoantigens. The Łuksza et al. (23) method considers the change of an amino acid at the anchor residues compared with the closest matched human peptide. *p < 0.05.
Given the inconsistent association of hydrophobicity with immunogenicity, we explored the association of hydrophobicity and immunogenicity when the data were separated out by HLA allele. Published motifs of the peptides that bind to different HLA alleles have demonstrated distinct differences in the conserved amino acids, suggesting that hydrophobicity may play a greater role in determining binding depending on the HLA allele. For example, published motifs for peptides that bind HLA-A01:01 and HLA-A03:01 contain conserved polar and charged amino acids, whereas motifs for peptides that bind HLA-A02:01 contain several conserved hydrophobic amino acids (60). When we assessed the association of hydrophobicity and immunogenicity for each HLA allele independently, immunogenic neoantigens were significantly less hydrophobic than immunogenic neoantigens for HLA-A01:01 using the hydrophobicity method from the TESLA consortium, consistent with the polar and charged conserved amino acids in the peptides that bind this allele (Supplemental Table I). These data suggest that there might be an advantage to separately evaluating the role of the hydrophobicity characteristic in predicting neoantigens for different HLA alleles. However, because of the small sample sizes available, we were not able to incorporate the effect of hydrophobicity on predicting immunogenicity based on the HLA allele.
Next, the degree of correlation between the characteristics calculated for each neoantigen was assessed. Only two characteristics were significantly correlated: the dissociation constant and the binding stability (Fig. 4A). Despite their correlation, there is evidence that the two characteristics both contribute to accurate neoantigen prioritization. Although the dissociation constant assesses the affinity of the interaction between the neoantigen and the MHC class I molecule, the stability predicts the length of time that the neoantigen will remain bound. The importance of binding stability is demonstrated in (Fig. 4B, where clustering of the immunogenic neoantigens in the upper left-hand corner can be observed, indicating the influence of both characteristics in determining immunogenicity.
Binding stability and dissociation constant are significantly correlated, but both contribute to immunogenicity. (A) Correlation between all calculated neoantigen characteristics: mRNA expression (Exp.), VAF, proteasomal cleavage potential (Cle.), TAP, MHC class I to neoantigen dissociation constant (Kd), MHC class I to neoantigen binding stability (Stab.), TCR recognition probability, amplitude, defined as the ratio of the MHC class I binding of the closest matched human peptide and the neoantigen (Amp.), and normalized sequence similarity score for closest matched human peptide and the neoantigen (Sim.). The size of the circles indicates the absolute magnitude of the correlation. (B) Correlation of MHC class I dissociation constant with MHC class I binding stability of the neoantigen. Lines show the linear relationship between the characteristics for immunogenic (black) and nonimmunogenic (gray) neoantigens, respectively. *p < 0.05.
Binding stability and dissociation constant are significantly correlated, but both contribute to immunogenicity. (A) Correlation between all calculated neoantigen characteristics: mRNA expression (Exp.), VAF, proteasomal cleavage potential (Cle.), TAP, MHC class I to neoantigen dissociation constant (Kd), MHC class I to neoantigen binding stability (Stab.), TCR recognition probability, amplitude, defined as the ratio of the MHC class I binding of the closest matched human peptide and the neoantigen (Amp.), and normalized sequence similarity score for closest matched human peptide and the neoantigen (Sim.). The size of the circles indicates the absolute magnitude of the correlation. (B) Correlation of MHC class I dissociation constant with MHC class I binding stability of the neoantigen. Lines show the linear relationship between the characteristics for immunogenic (black) and nonimmunogenic (gray) neoantigens, respectively. *p < 0.05.
Capietto et al. (61) recently found that a different set of neoantigen characteristics will influence immunogenicity, if the mutation occurs in an anchor residue compared with mutations in nonanchor residues. They demonstrated that, if a mutation occurred in an anchor residue, the amplitude had a greater predictive value than the dissociation constant of the neoantigen:MHC class I complex alone. To assess the influence of the mutation position, the distribution of the amplitude and the unadjusted dissociation constant for neoantigens with a mutation in an anchor or a nonanchor residue were analyzed. To maximize the chances of detecting a significant difference with the relatively low number of immunogenic neoantigens derived from mutations in anchor residues (n = 13), the analysis of the impact of mutation’s position was performed across the combination of four datasets (7, 11, 16, 36). As shown in Supplemental Fig. 1, no statistically significant difference was observed for the amplitude with either anchor or nonanchor residue mutations. Although the anchor–residue mutations did have a higher average amplitude in immunogenic neoantigens, the difference was not statistically significant. In contrast, the dissociation constant was significantly lower in immunogenic neoantigens than nonimmunogenic neoantigens. Therefore, there was not a compelling reason to fit separate models for neoantigens with mutations in anchor residues and those with mutations in nonanchor residues. Furthermore, because the amplitude is mathematically dependent on the dissociation constant, the amplitude was not included in the subsequent steps of model development.
Regularized regression approach creates a neoantigen prioritization model, NeoScore, that outperforms existing models that score each neoantigen
Although analysis of the distribution of each characteristic determined those with a statistically significant difference between immunogenic and nonimmunogenic neoantigens, we next assessed the full set of characteristics to determine which influenced the ability to optimally prioritize SNV and small indel-derived immunogenic neoantigens. A regularized regression is a model-based approach to determine the group of characteristics that are important for discriminating between immunogenic and nonimmunogenic neoantigens, regardless of whether each characteristic is statistically significant. A logistic model using an elastic net-based regularization was unable to identify individual characteristics that are most predictive of immunogenicity by optimizing shrinkage penalties, a consequence that is often seen with a small effective sample size. The effective sample size for these regularized regression methods using models from the binomial family is based on the class with the smallest number of observations, which is the immunogenic neoantigens (n = 26). Consequently, a cross-validation approach was applied to select the best subset of neoantigen characteristics. One thousand combinations of 26 nonimmunogenic and 26 immunogenic neoantigens were randomly selected, and the elastic net regularized regression was fit on each combination (Fig. 5A). Performing 1000 random samples allowed for the examination of the impact of neoantigen characteristics on immunogenicity while adjusting for the small effective sample size. The number of times each characteristic was selected out of the 1000 random combinations was tracked. The dissociation constant, binding stability, and mRNA expression were each selected >700 of the 1000 times, while no other characteristic was selected >500 times (Fig. 5B). These fits were consistently able to distinguish immunogenic and nonimmunogenic neoantigens, as demonstrated by the high density of AUC values around the mean of 0.861 (25th percentile, 0.828; 75th percentile, 0.895) (Fig. 5C).
Regularized regression approach selects dissociation constant, expression, and stability as the characteristics of greatest importance in prioritizing immunogenicity. Regularized (elastic net) regression approach to the selection of neoantigen characteristics. (A) Process used for characteristic selection. Twenty-six immunogenic and 26 randomly selected nonimmunogenic neoantigens were isolated 1000 times and fit with an elastic net regression. For every fit, the characteristics selected by the model, as well as its performance, were tracked. (B) The number of times each characteristic was selected out of 1000 runs. The characteristics included are as follows: MHC class I to neoantigen dissociation constant (Kd), mRNA expression (Expression), MHC class I to neoantigen binding stability (Stability), VAF, hydrophobicity of the anchor residues (Hydro), TAP, TCR recognition probability, normalized sequence similarity score for closest matched human peptide and the neoantigen (Similarity), and proteasomal cleavage potential (Cleavage). (C) Distribution of the AUC values for each of the 1000 fits. The solid black line indicates the mean AUC, and the dashed black lines represent the 25th and 75th percentile, respectively.
Regularized regression approach selects dissociation constant, expression, and stability as the characteristics of greatest importance in prioritizing immunogenicity. Regularized (elastic net) regression approach to the selection of neoantigen characteristics. (A) Process used for characteristic selection. Twenty-six immunogenic and 26 randomly selected nonimmunogenic neoantigens were isolated 1000 times and fit with an elastic net regression. For every fit, the characteristics selected by the model, as well as its performance, were tracked. (B) The number of times each characteristic was selected out of 1000 runs. The characteristics included are as follows: MHC class I to neoantigen dissociation constant (Kd), mRNA expression (Expression), MHC class I to neoantigen binding stability (Stability), VAF, hydrophobicity of the anchor residues (Hydro), TAP, TCR recognition probability, normalized sequence similarity score for closest matched human peptide and the neoantigen (Similarity), and proteasomal cleavage potential (Cleavage). (C) Distribution of the AUC values for each of the 1000 fits. The solid black line indicates the mean AUC, and the dashed black lines represent the 25th and 75th percentile, respectively.
Based on the results of the model-based regression approach, a logistic regression model was fit with the dissociation constant, binding stability, and expression in the TESLA consortium data. The final logistic regression model will be called the “NeoScore.” The equation for the model is as follows:
where E is the scaled, log10-transformed expression value, Kd is the log10-transformed dissociation constant in units of nanomolar (nM), and S is the log10-transformed stability measured as the half-life of the binding interaction in units of hours. The coefficients for expression and stability were both positive, as expected because higher expression and longer binding times likely increase the chance of a neoantigen to elicit an immune response. The coefficient for the dissociation constant was negative, as expected because a lower dissociation constant indicates a higher binding affinity. These coefficients match the observed directions of change from (Fig. 2. Raw data from NetMHCpan, NetMHCstabpan, and Salmon can be processed and combined to return a set of neoantigens prioritized by their NeoScore using the following web application: https://bordene.shinyapps.io/MHCI_neoantigen_prioritization/.
Given the large discrepancy between the number of immunogenic and nonimmunogenic neoantigens, we assessed the impact of changing the ratio of immunogenic to nonimmunogenic neoantigens on the performance of the model. This was done by fitting the logistic regression model on 100 random down-sampled datasets from the TESLA consortium data at 10 different ratios of immunogenic to nonimmunogenic neoantigens. Each of the logistic regression models was then applied to the Cohen dataset, and the optimism was calculated by subtracting the AUC on the Cohen dataset from the AUC on the TESLA dataset. The data demonstrated that decreasing the sample size of either immunogenic or nonimmunogenic neoantigens increased the optimism, suggesting a less generalizable model. The lowest optimism was obtained when the maximal numbers of both immunogenic and nonimmunogenic neoantigens were used, even though there were not equivalent numbers of immunogenic to nonimmunogenic neoantigens (Supplemental Fig. 2).
Once the NeoScore model was fit in the TESLA dataset, model performance was assessed in both the TESLA training dataset and four independent test datasets. In the TESLA consortium dataset, the NeoScore had an AUC of 0.845, which exceeds the AUC of 0.70 needed to be considered a discriminatory model (62). The NeoScore also outperformed the AUC of the Łuksza model (AUC = 0.615) (Table II, Fig. 6A). The NeoScore was then tested in four additional datasets. The NeoScore outperformed the Łuksza model in the Cohen (0.832 AUC versus 0.689 AUC) and Strønen datasets (0.681 AUC versus 0.620 AUC) (35, 36) (Table II, Fig. 6B, 6C). In the Carreno dataset, the NeoScore slightly outperformed Łuksza and the pTuneos hydrophobicity model (0.704 AUC for NeoScore, 0.696 AUC for pTuneos hydrophobicity, and 0.657 AUC for Łuksza) (Table II, Fig. 6D). Published results of the immunogenicity scores from the pTuneos model were used (18), because the model was not able to be successfully run with the other datasets. Both the hydrophobicity-only model and the full model provided by pTuneos are included, because the hydrophobicity model outperformed the full model. Similarly, in the Ott dataset, the NeoScore slightly outperformed the Łuksza model (0.609 AUC for NeoScore and 0.575 AUC for Łuksza) (11) (Table II, Fig. 6E).
Comparison of the AUC and optimal threshold sensitivities and specificities for the prediction of immunogenic neoantigens using NeoScore versus other models
Dataset . | Model . | Sensitivity [95% CI] . | Specificity [95% CI] . | AUC . |
---|---|---|---|---|
TESLA consortium (n = 347) (16) | NeoScore | 0.846 [0.769–1.00] | 0.738 [0.523–0.844] | 0.845 |
Abbreviated NeoScorea | 0.923 [0.731–1.00] | 0.567 [0.530–0.760] | 0.772 | |
Łuksza | 0.692 [0.462–0.885] | 0.561 [0.439–0.763] | 0.615 | |
TESLA consortium | 0.385 | 0.941 | — | |
Cohen (n = 357) (35) | NeoScore | 0.857 [0.571–1.00] | 0.546 [0.494–0.597] | 0.832 |
Abbreviated NeoScorea | 1.00 [1.00–1.00] | 0.363 [0.311–0.414] | 0.744 | |
Łuksza | 1.00 [1.00–1.00] | 0.254 [0.211–0.303] | 0.689 | |
TESLA consortium | 0.571 | 0.857 | — | |
Strønen (n = 57) (36) | NeoScore | 0.364 [0.091–0.636] | 0.889 [0.800–0.978] | 0.681 |
Abbreviated NeoScorea | 0.909 [0.727–1.00] | 0.644 [0.511–0.778] | 0.794 | |
Łuksza | 0.727 [0.455–1.00] | 0.422 [0.289–0.556] | 0.620 | |
TESLA consortium | 0.364 | 0.867 | — | |
Carreno (n = 21) (7) | NeoScore | 0.444 [0.111–0.778] | 0.750 [0.500–1.00] | 0.704 |
Abbreviated NeoScorea | 0.778 [0.556–1.00] | 0.833 [0.583–1.00] | 0.935 | |
Łuksza | 0.778 [0.553–1.00] | 0.583 [0.333–0.833] | 0.657 | |
TESLA consortium | 0.222 | 1.00 | — | |
pTuneos hydrophobicity | 0.500 [0.125–1.00] | 0.857 [0.429–1.00] | 0.696 | |
pTuneos full model | 0.250 [0.125–1.00] | 1.00 [0.143–1.00] | 0.536 | |
Ott (n = 165) (11) | NeoScore | 0.222 [0.056–0.389] | 0.878 [0.823–0.932] | 0.609 |
Abbreviated NeoScorea | 0.389 [0.167–0.611] | 0.782 [0.714–0.844] | 0.597 | |
Łuksza | 0.626 [0.544–0.701] | 0.500 [0.278–0.722] | 0.575 | |
TESLA consortium | 0.056 | 0.959 | — |
Dataset . | Model . | Sensitivity [95% CI] . | Specificity [95% CI] . | AUC . |
---|---|---|---|---|
TESLA consortium (n = 347) (16) | NeoScore | 0.846 [0.769–1.00] | 0.738 [0.523–0.844] | 0.845 |
Abbreviated NeoScorea | 0.923 [0.731–1.00] | 0.567 [0.530–0.760] | 0.772 | |
Łuksza | 0.692 [0.462–0.885] | 0.561 [0.439–0.763] | 0.615 | |
TESLA consortium | 0.385 | 0.941 | — | |
Cohen (n = 357) (35) | NeoScore | 0.857 [0.571–1.00] | 0.546 [0.494–0.597] | 0.832 |
Abbreviated NeoScorea | 1.00 [1.00–1.00] | 0.363 [0.311–0.414] | 0.744 | |
Łuksza | 1.00 [1.00–1.00] | 0.254 [0.211–0.303] | 0.689 | |
TESLA consortium | 0.571 | 0.857 | — | |
Strønen (n = 57) (36) | NeoScore | 0.364 [0.091–0.636] | 0.889 [0.800–0.978] | 0.681 |
Abbreviated NeoScorea | 0.909 [0.727–1.00] | 0.644 [0.511–0.778] | 0.794 | |
Łuksza | 0.727 [0.455–1.00] | 0.422 [0.289–0.556] | 0.620 | |
TESLA consortium | 0.364 | 0.867 | — | |
Carreno (n = 21) (7) | NeoScore | 0.444 [0.111–0.778] | 0.750 [0.500–1.00] | 0.704 |
Abbreviated NeoScorea | 0.778 [0.556–1.00] | 0.833 [0.583–1.00] | 0.935 | |
Łuksza | 0.778 [0.553–1.00] | 0.583 [0.333–0.833] | 0.657 | |
TESLA consortium | 0.222 | 1.00 | — | |
pTuneos hydrophobicity | 0.500 [0.125–1.00] | 0.857 [0.429–1.00] | 0.696 | |
pTuneos full model | 0.250 [0.125–1.00] | 1.00 [0.143–1.00] | 0.536 | |
Ott (n = 165) (11) | NeoScore | 0.222 [0.056–0.389] | 0.878 [0.823–0.932] | 0.609 |
Abbreviated NeoScorea | 0.389 [0.167–0.611] | 0.782 [0.714–0.844] | 0.597 | |
Łuksza | 0.626 [0.544–0.701] | 0.500 [0.278–0.722] | 0.575 | |
TESLA consortium | 0.056 | 0.959 | — |
The abbreviated NeoScore omits expression.
NeoScore outperforms the model by Łuksza et al. (23) and performs equivalently to the model by the TESLA consortium (16). Comparison of the AUC for the NeoScore and abbreviated NeoScore compared with the models by Łuksza, the TESLA consortium, and pTuneos (18). (A) Performance of the NeoScore and abbreviated NeoScore in the TESLA consortium dataset (the dataset in which the model was fit). Comparison of the NeoScore and abbreviated NeoScore with existing models in the (B) Cohen dataset (35), (C) Strønen dataset (36), (D) Carreno dataset (7), and (E) Ott dataset (11).
NeoScore outperforms the model by Łuksza et al. (23) and performs equivalently to the model by the TESLA consortium (16). Comparison of the AUC for the NeoScore and abbreviated NeoScore compared with the models by Łuksza, the TESLA consortium, and pTuneos (18). (A) Performance of the NeoScore and abbreviated NeoScore in the TESLA consortium dataset (the dataset in which the model was fit). Comparison of the NeoScore and abbreviated NeoScore with existing models in the (B) Cohen dataset (35), (C) Strønen dataset (36), (D) Carreno dataset (7), and (E) Ott dataset (11).
Because the model by the TESLA consortium consists of a single set of recommended thresholds for the neoantigen:MHC class I dissociation constant, neoantigen:MHC class I binding stability, and expression, the NeoScore could not be compared with the TESLA consortium model in terms of the AUC. Therefore, an optimal threshold for the NeoScore was selected that maximized the sum of the sensitivity and specificity in the TESLA dataset and classified the NeoScore into high and low immunogenicity. The reported sensitivity and specificity for each dataset are based on the threshold optimized in the TESLA dataset (−2.478). In the TESLA and Cohen datasets, the NeoScore obtained a greater sensitivity with a lower specificity at the optimal cutpoint compared with the model by the TESLA consortium. Across all the remaining datasets, the sensitivity and specificity of the NeoScore were statistically equivalent to that achieved by the TESLA consortium, as demonstrated by the overlap of the 95% confidence interval (CI; Table II).
Because only a subset of neoantigens was tested for immunogenicity by the TESLA consortium, we assessed the full set of neoantigens predicted to be immunogenic by the NeoScore model. All possible 9mer and 10mer neoantigens were generated from each mutation across the five tumors in the TESLA dataset, and the immunogenicity of each neoantigen was prioritized by the NeoScore model. To make the analysis comparable with that done by the TESLA consortium, we assessed the neoantigens for their predicted immunogenicity in the context of the HLA types that were tested by the TESLA consortium. The sensitivity and specificity for the validated immunogenic neoantigens were as reported in Table I. The NeoScore predicted 740 additional immunogenic neoantigens across the five tumors that had not been tested by the TESLA consortium (range, 54–310 per patient; data not shown). The large number of untested neoantigens is consistent with the low overlap observed between groups in the original TESLA consortium submissions. Among the 25 submissions, a median of 13% overlap was observed between the neoantigens predicted to be immunogenic (16). The low overlap and large number of untested candidates support the need for further validation of neoantigen prioritization models.
Despite the high performance of the NeoScore in the TESLA and Cohen datasets, there is a marked decrease in the performance in the Carreno et al. (7), Strønen et al. (36), and Ott et al. (11) datasets. A likely cause for the decreased performance is how the immunogenicity was tested in these datasets. TESLA and Cohen both tested for reactive T cells present in the patient with no additional T cell stimulation. In contrast, Carreno et al. (7) administered a dendritic cell vaccine with each of the predicted neoantigens and subsequently tested for an immune response to each neoantigen. Strønen et al. (36) exposed PBMCs from healthy patients to dendritic cells transfected with the neoantigen of interest. They then tested for an immune response to those neoantigens. Ott et al. (11) immunized patients with pools of long synthetic peptides and then tested for an immune response to each neoantigen that could be generated from the long peptides. None of these three methods rely on the expression of the neoantigen in the tumor to activate neoantigen-specific T cells. Therefore, a logistic regression model was fit to the TESLA dataset using only the neoantigen:MHC class I binding stability and dissociation constant. The following abbreviated NeoScore model was obtained:
The coefficients obtained for stability and dissociation constant are comparable with those obtained in the full NeoScore model. The threshold for the abbreviated NeoScore was optimized in the TESLA dataset (−2.856) and then tested in the Cohen et al. (35), Strønen et al. (36), Carreno et al. (7), and Ott et al. (11) datasets. As expected, the abbreviated NeoScore underperformed compared with the full NeoScore model in Cohen and TESLA, but outperformed the full NeoScore model in both Carreno et al. (7) and Strønen et al. (36) (Table II; (Fig. 6). These results suggest that expression predicts neoantigen immunogenicity when priming of T cell responses is dependent on expression of the neoantigen by the tumor. However, when the T cells are stimulated independently of expression by the tumor, the predictive benefit of expression is undermined. Similarly, the performance of the TESLA consortium model, which relies on neoantigen expression, distinctly declines in the Carreno et al. (7) and Strønen et al. (36) datasets (Table II; (Fig. 6). The poor performance in the Ott et al. (11) dataset across models remains unexplained. The Ott et al. (11) dataset did not significantly differ from the other datasets in terms of the distribution of the location of the mutations (anchor versus nonanchor residues) or the general distribution of the characteristics (data not shown). Overall, elimination of expression enhanced the performance of the NeoScore model when considering the immune response stimulated independently of the tumor.
To further reaffirm the subset of neoantigen characteristics selected, a logistic regression model was fit using all nine neoantigen characteristics from (Fig. 5B, which did not notably improve the AUC (0.853 for the model with all nine characteristics compared with 0.845 for the NeoScore; data not shown). Furthermore, the optimism is much higher for the model that included all neoantigen characteristics (8.64%) than the NeoScore (2.47%). The optimism indicates the likelihood that the model is overfitting the training data, which would make a less generalizable model. The lack of benefit from the additional characteristics adds support that the subset of characteristics selected is optimal for prioritizing immunogenic neoantigens. Overall, the model with the neoantigen:MHC class I dissociation constant and binding stability and mRNA expression showed the best potential to consistently separate immunogenic and nonimmunogenic neoantigens in validated datasets.
A high maximum NeoScore has a significant association with improved survival in cutaneous melanoma patients treated with immunotherapy
Once the ability of the NeoScore to discriminate immunogenic from nonimmunogenic neoantigens with high sensitivity and specificity was established, the association of a single, strongly immunogenic neoantigen with survival in response to immune checkpoint inhibition was assessed. The survival analysis was performed across two datasets, one with treatment with an anti–CTLA-4 mAb (21) and the other with anti–PD-1 mAbs (29). In both datasets, the cohort was restricted to immunotherapy-naive patients with cutaneous melanoma (21, 29), which allowed us to assess the factors that drive response to immune checkpoint inhibition in melanoma with a high tumor mutational burden and no previous immunoediting. Although the range of mutational burden in cutaneous melanoma is wide [1368–33,591 for the Van Allen et al. (21) dataset and 864–24,292 for the Liu et al. (29) dataset], all samples have a high mutational burden because of UV-induced DNA mutations. The final cohort sizes were 34 for Van Allen et al. (21) and 53 for Liu et al. (29). However, the statistical power of the survival analysis is limited by the number of deaths observed in each dataset, resulting in an effective sample size of 22 for Van Allen et al. (21) and 20 for Liu et al. (29). The NeoScore for each individual neoantigen per tumor was calculated. Because each patient has many neoantigens with a wide range of NeoScores, there are several potential ways to summarize the neoantigen profile for the sake of comparison between patients. Three ways were selected in this study: (1) the mutational burden; (2) the neoantigen burden; and (3) the highest NeoScore for a neoantigen from the patient, referred to as the “maximum NeoScore.” The mutational burden and neoantigen burden were attempted for the sake of comparison with the literature, while the maximum NeoScore was used in accordance with the principle that a single immunogenic neoantigen can drive the immune response. Survival analysis was then performed separately for the mutational burden, neoantigen burden, and the maximum NeoScore.
Although an optimal NeoScore threshold was determined for the prediction of neoantigen immunogenicity, every patient had at least one neoantigen that exceeded the optimized threshold, indicating that the presence of a predicted immunogenic neoantigen was not sufficient to differentiate the response to immune checkpoint inhibition. Therefore, unique optimal thresholds for survival analysis were determined for the maximum NeoScore using maximally ranked statistics implemented in the MaxStat package and R statistical software (58). Maximally ranked statistics methods implement a search over all possible log-rank test statistics based on thresholds of a predictor variable (here the maximum NeoScore) for the largest standardized log-rank test statistic. Because maximally ranked statistic methods optimize the threshold to detect a difference, performance of the NeoScore and optimal threshold will need to be validated in an independent dataset.
Next, the association of mutational burden and neoantigen burden with survival in response to immune checkpoint inhibition was evaluated. For the sake of consistency, maximally ranked statistics were also used to select the optimal threshold for the mutational burden and neoantigen burden. In the Van Allen et al. (21) dataset, there was no association between mutational burden and progression-free survival in response to immune checkpoint inhibition (p = 0.37) (Fig. 7A). In the Liu et al. (29) dataset, a high tumor mutational burden was significantly associated with poor progression-free survival (p = 0.047) (Fig. 7B), which is consistent with the finding of Liu et al. (29). The neoantigen burden was strongly correlated with the mutational burden in both datasets (Fig. 7C, 7D). On survival analysis, the same results were observed for neoantigen burden as for mutational burden, where a high neoantigen burden was not associated with improved progression-free survival in the Van Allen et al. (21) dataset and was associated with decreased progression-free survival in the Liu et al. (29) dataset (data not shown).
High maximum NeoScore has improved association with survival compared with mutational burden. Comparison of progression-free survival probability within treatment-naive, cutaneous melanoma patients. All splits between the groups were determined using maximally ranked statistics, and p values were calculated using a log-rank test. A high tumor mutational burden is (A) not significantly associated with survival in the Van Allen et al. (21) data and (B) is significantly associated with decreased survival in the Liu et al. (29) data. Mutational burden directly correlates with neoantigen burden in (C) the Van Allen data and (D) the Liu data. Patients with a high maximum (max.) NeoScore have (E) significantly increased survival in the Van Allen data and (F) significantly increased survival in the Liu dataset.
High maximum NeoScore has improved association with survival compared with mutational burden. Comparison of progression-free survival probability within treatment-naive, cutaneous melanoma patients. All splits between the groups were determined using maximally ranked statistics, and p values were calculated using a log-rank test. A high tumor mutational burden is (A) not significantly associated with survival in the Van Allen et al. (21) data and (B) is significantly associated with decreased survival in the Liu et al. (29) data. Mutational burden directly correlates with neoantigen burden in (C) the Van Allen data and (D) the Liu data. Patients with a high maximum (max.) NeoScore have (E) significantly increased survival in the Van Allen data and (F) significantly increased survival in the Liu dataset.
When tested with the optimal threshold, patients in the Van Allen et al. (21) dataset with a high maximum NeoScore (>−0.525) had significantly improved progression-free survival (p = 3.5 × 10−4) (Fig. 7E). Similarly, in the Liu et al. (29) dataset, patients with a high maximum NeoScore (>−0.152) had significantly improved progression-free survival (p = 8.2 × 10−4) (Fig. 7F). In both the Van Allen and Liu datasets, a high maximum NeoScore was associated with significantly improved overall survival at the same thresholds (p = 0.002 and p = 0.013, respectively; data not shown). Overall, these results suggest an improved association of the NeoScore with survival after treatment with immunotherapy, compared with tumor mutational burden, in tumors with high tumor mutational burden.
Given the added expense of RNAseq data in a clinical setting, we assessed whether the abbreviated NeoScore would have a similar association with progression-free survival in response to immunotherapy. Therefore, we analyzed the association of the maximum abbreviated NeoScore with progression-free survival in response to treatment in the Liu et al. (29) and Van Allen et al. (21) datasets, as well as an additional lung cancer dataset with no available RNAseq data from Rizvi et al. (25). Although the abbreviated NeoScore demonstrated a significant association with progression-free survival in the Van Allen et al. (21) dataset, there was no significant association of the abbreviated NeoScore with progression-free survival in either of the other test datasets (Supplemental Fig. 3). The loss of a significant association in the absence of the expression characteristic further emphasizes the importance of expression to the clinical relevance of the NeoScore model.
Discussion
Prioritization of immunogenic neoantigens is critical for applications to both the development of personalized vaccines and the identification of patients who are likely to benefit from treatment with immune checkpoint inhibition. However, many neoantigen characteristics have been suggested in the literature to date with no consensus on which characteristics impact whether the neoantigen generates a T cell response. In addition, no model has demonstrated both the ability to score each neoantigen with high sensitivity and specificity and significant association with survival in response to immune checkpoint inhibition. The successes of this study are as follows: (1) identification of those neoantigen characteristics of greatest importance in determining neoantigen immunogenicity; (2) the combination of these characteristics into a single overall immunogenicity score, the NeoScore, with practical applications to personalized vaccine development; (3) integration of the NeoScore into a web application for easy use; and (4) demonstration of the clinical significance of the NeoScore in melanoma.
A model-based statistical prediction approach was used to select the characteristics of SNV and small indel-derived neoantigens that were most predictive of immunogenicity. The dissociation constant and binding stability of the neoantigen:MHC class I complex and the expression were the combination of neoantigen characteristics best able to discriminate between immunogenic and nonimmunogenic neoantigens. Although the identification of these three characteristics is consistent with the recent findings of the TESLA consortium (16), it is important to note that the approaches taken by our group and the original group differed in several key ways. First, a completely agnostic approach was applied to the selection of characteristics, including all characteristics that have been suggested in the literature to date, whereas the TESLA consortium began with those shown to have statistical significance. Taking a completely agnostic approach ensured the greatest potential to select the combination of characteristics that maximized the separation of immunogenic and nonimmunogenic neoantigens. Second, additional characteristics were included that were not considered by the TESLA consortium, including VAF and sequence similarity to the closest matched human peptide. Finally, these results were expanded by combining the characteristics into an overall immunogenicity score, the NeoScore. A web application has been made available to calculate the NeoScore or the abbreviated NeoScore and provide a list of neoantigens prioritized by their predicted immunogenicity. The web application is expected to streamline the application of NeoScore for research purposes.
One of the key advantages of the NeoScore is that it allows for the prioritization of neoantigens that may not exceed the single set of thresholds provided by the TESLA consortium. Although a threshold for the NeoScore was optimized in the TESLA dataset and demonstrated strong performance across the test datasets, the optimized threshold is necessarily conservative for two reasons. First, all datasets used to build and test the NeoScore consisted of neoantigens that had already been prioritized by the original group. Thus, the NeoScore is trained to discriminate between top candidates. Second, the NeoScore is trained on the natural T cell responses to neoantigens with no stimulation by a therapeutic agent. Treatment with immunotherapy or personalized vaccines may be able to elicit an immune response to neoantigens with a lower NeoScore. Therefore, all neoantigens can be ranked in order of their NeoScore. Researchers applying NeoScore to a new dataset can first rank the predicted neoantigens, then decide on the optimal number of neoantigens to test for a patient based on the unique neoantigen profile of the given patient.
One consideration for future applications of the NeoScore is that it is based on a combination of predictive tools that each comes with its own sensitivity and specificity. Although we have selected highly rated tools for MHC class I dissociation constants and binding stability, these tools are constantly improving. One example is that the MHC class I binding stability tool that we employed is trained on stability data for 25,000 peptides, which is more than an order of magnitude less than the training data available for MHC class I dissociation constant predictions (52). As expanded training data becomes available, the predictive value of each tool is likely to increase, which, in turn, will further enhance the predictive value of integrated models such as the NeoScore. Additionally, some recent work has suggested the potential for combining predictions from multiple tools in either a consensus or aggregate approach to further enhance the predictive value of each individual tool (63). Further research is needed into how to optimally aggregate the scores from multiple tools to enhance their application. However, the high performance of the NeoScore in predicting immunogenicity of individual neoantigens and significant association with survival in response to immune checkpoint inhibitors indicates the high performance of each of the individual tools combined to create the NeoScore.
A surprising finding is the lack of association between mutational burden and progression-free survival in the Van Allen et al. (21) dataset and the association of increased mutational burden with decreased progression-free survival in the Liu et al. (29) dataset. These results are inconsistent with the association of increased mutation burden with increased response rate to immune checkpoint inhibition observed across cancer types (27). An explanation for the lack of association of mutation burden with survival after treatment with immune checkpoint inhibition is that the cutaneous melanoma subset consists of all tumors with a high mutational burden. A tumor with a particularly low mutational burden may not have any neoantigens, causing it to have a poor response to immune checkpoint inhibition. However, a high mutational burden alone does not guarantee a good response to immune checkpoint inhibition. As demonstrated by our work, a high maximum NeoScore has an improved association with progression-free survival compared with mutational burden.
The literature to date supports that mutational burden is associated with response to therapy across cancers (27) or across cancer subtypes, but not within tumors with a high mutational burden. Within non–small cell lung cancer, there was a significant increase in response rates and survival in patients with a higher mutational burden than those with a lower mutational burden (25). These results reflect a split between the patients with a mutational signature from smoking carcinogens and those with no evidence of exposure to smoking carcinogens. In contrast, within small cell lung cancers, which are nearly universally associated with smoking, there was a weaker association of mutational burden with response to therapy (22). Three independent studies demonstrated a significant association between high mutational burden and improved response to immune checkpoint inhibition with either anti–CTLA-4 or anti–PD-1 mAb treatment in melanoma patients (21, 26, 29). However, these studies included cutaneous, occult, acral, and mucosal melanoma, which may have different mutational profiles. As demonstrated in this study, there is no association between high mutational burden and improved progression-free survival in the Van Allen et al. (21) and Liu et al. (29) datasets when restricted to cutaneous melanoma. As noted by Snyder et al. (26), the patient in their dataset with the highest number of mutations had minimal or no benefit from anti–CTLA-4 mAb treatment. Overall, in combination with prior studies, our results highlight the importance of considering the immunogenicity of the neoantigen in predicting the response to immune checkpoint inhibition.
Despite the successes of our work to date, there are several limitations that highlight areas for future research. For one, the maximum immunogenicity score is not able to account for the response to therapy of all patients. One possible reason that the NeoScore is not able to fully predict treatment response is that the model did not include neoantigens derived from gene fusions, products of noncanonical open reading frames, canonical open reading frames with a frameshift, or large indels. To our knowledge, there is no available dataset that has validated neoantigens derived from these sources. Given that recent work has demonstrated that a single neoantigen from a gene fusion product can drive complete tumor regression (64) and noncanonical proteins disproportionately generate MHC class I binding neoantigens (65), patients misclassified by our model may have had neoantigens from one of these classes of mutations. In addition, because each of the validated datasets tested the immunogenicity of neoantigens that had already been prioritized by the original group, there may be classes of neoantigens that are immunogenic but were not included in any test dataset. Consideration of additional classes of neoantigens is particularly important given recent evidence that there are classes of neoantigens that have very low binding to MHC class I (66). The inclusion of neoantigens derived from a broader set of mutations may alter the neoantigen characteristics selected as important by our regularized regression approach. For example, neoantigens derived from gene fusions, large indels, or frameshifts are likely to have less sequence similarity to normal human peptides than an SNV or small indel-derived neoantigen, causing characteristics such as the sequence similarity or amplitude to be of greater importance. The need to consider additional types of mutation underscores the importance of generating a validated dataset, including these additional mutations, and repeating characteristic selection.
A second reason that the NeoScore may not be able to account for the response to therapy of all patients is that the model does not consider whether the neoantigens ranked as immunogenic were able to elicit a CD4+ T cell response. Studies have observed that the most effective vaccines are those with neoantigens that elicit combined CD4+ and CD8+ T cell responses (67–71). Additionally, work by Alspach et al. (67) demonstrated the need for MHC class II–restricted neoantigen expression by tumor cells to elicit CD4+ T cell responses and generate effective antitumor immune responses both in the absence of therapy and in response to immunotherapy. What is still unknown is whether there is any advantage to having a single neoantigen that elicits both a CD8+ and CD4+ T cell–mediated response compared with two independent neoantigens that elicit CD8+ and CD4+ T cell responses independently. If the ideal circumstance is a single neoantigen capable of stimulating a combined response, an efficient approach may be to test the top MHC class I–restricted neoantigens for their potential to elicit a CD4+ T cell response. However, if two separate neoantigens are equally effective, a separate model for the prioritization of MHC class II–restricted neoantigens may be of greater clinical utility. Overall, there is a need for improved understanding of the interplay between MHC class I– and II–restricted neoantigens in stimulating an effective antitumor immune response.
This work has successfully identified the key neoantigen characteristics associated with neoantigen immunogenicity using an agnostic approach that considered all the characteristics suggested by the literature to date. These characteristics have been integrated into a single, overall score, the NeoScore, that predicts the immunogenicity of each neoantigen with high sensitivity and specificity. Finally, a high maximum NeoScore has a significant association with improved survival in response to treatment with immune checkpoint inhibition in cutaneous melanoma. The NeoScore is anticipated to improve neoantigen prioritization for the development of personalized vaccines and the determination of which patients are likely to respond to immunotherapy.
Acknowledgements
We thank Dr. Tanya N. Phung for advice and assistance in comparing the software for the identification of SNVs and indels and integrating the pipeline from raw WES data to potential neoantigens. We thank Dr. Chi Zhou for assistance with the pTuneos software. We thank Anngela Adams for assistance in polishing figures.
Footnotes
This work was supported by the Springboard Initiative from the University of Arizona College of Medicine-Phoenix (K.T.H.), the University of Arizona College of Medicine-Phoenix M.D./Ph.D. Program (E.S.B.), and a Medical Student Award from the Melanoma Research Foundation (MRF) Medical Student Award (E.S.B.). This work benefited from support for a related project by Merit Review Award I01-BX005336 from the U.S. Department of Veterans Affairs (VA), Biomedical Laboratory Research and Development Service (to K.T.H.).
Conceptualization: E.S.B., K.H.B., M.A.W., and K.T.H. Methodology: E.S.B., S.G., B.J.L., M.A.W., and K.T.H. Software: E.S.B. Formal analysis: E.S.B. and B.J.L. Investigation: E.S.B. Resources: K.H.B., M.A.W., and K.T.H. Writing – original draft: E.S.B. and K.T.H. Writing – reviewing and editing: E.S.B., K.H.B., B.J.L., M.A.W., and K.T.H. Visualization: E.S.B. Supervision: K.T.H. Funding acquisition: K.T.H.
The contents do not represent the views of the Department of Veterans Affairs or the U.S. Government.
The online version of this article contains supplemental material.
Abbreviations used in this article:
References
Disclosures
B.J.L. has received financial support from Cofactor Genomics and Iron Horse Dx. P.C. The other authors have no financial conflicts of interest.