In this article, we will describe Neopepsee, a new algorithm for immune peptide discovery that provides annotated sequences of candidate peptides and predictive values of immunogenicity. We will also discuss its performance compared with other methods and validated datasets. We demonstrate that Neopepsee exhibits improved performance in two cancer types: melanoma and chronic lymphocytic leukemia. We report on its predictive performance against validated datasets, confirming its high accuracy.
Detects neoantigen candidates with less false positives
A high-quality neoantigen detection algorithm should be able to detect neoantigen candidates with a low number of false positives. The deFuse-Trinity comparison program is designed to detect fusions with sufficient read depth and span counts to predict which protein is a good candidate for vaccine development. The data should be derived from OS tumors, which are known to be highly heterogeneous. High-expression fusions are therefore more likely to represent a higher proportion of the tumour cell population. To find enough neoantigen candidates, at least 40 million paired-end reads should be used to collect data.
Using a neopeptide library, a neoantigen candidate can be identified based on its dissimilarity to the self or to known immunogenic peptides. It also helps reduce the number of false positives due to the lack of pre-trained mice. A neoantigen candidate should have less than 10% false positives to improve its chance of being recognized.
Using unbiased immunopeptidomics, myNEO’s novel approach to detecting neoantigen candidates has several advantages. First, it narrows down the list of candidate neoantigens with fewer false positives. Second, MS-based immunopeptidomics directly interrogates HLA-bound peptides and post-translational modifications. Third, it detects non-canonical peptides, such as those derived from alternative open reading frames, intronic sequences, or 5’UTRS.
RNAseq and tumor WES have complementary advantages. Wes can identify tumor-specific neoantigens by analyzing the expression of candidate neoantigens. RNAseq provides additional information, such as amplification frequency and expression level. RNAseq data is also used to detect low-frequency somatic mutations.
Besides the exome-based method, RNA-seq also enables the identification of peptides that are derived from RNA editing processes. For example, mutations identified using WES data can be assigned to tumor DNA, while RNAseq can only assign to tumor RNA, whereas epitopes derived from edited RNA cannot be regarded as candidate neoantigens. However, tumor RNAseq provides a broader landscape of candidate neoantigens.
High-throughput NGS can identify neoantigen candidates. Then, computational algorithms can predict their MHC-peptide binding affinity, which can be used to narrow down the candidate list and prioritize candidates with the highest likelihood of inducing tumor-specific T-cell responses. Using in-silico neoantigen prediction, we can select a small number of neoantigen candidates and confirm their antigenicity in biological assays.
The unbiased screening approach used in this study identified two mutated antigens (KIF2C and POLA2) that were previously thought to be self-antigens. This strategy was able to identify 720 non-synonymous somatic mutations encoded by 62 TMGs. The data were matched to three HLA molecules for ten neoantigens.
The best in vitro evidence that a neoantigen candidate is a true neoantigen is preferentially recognized by T-cells in tumors that have a specific HLA-I expression pattern. To be considered immunogenic, the neoantigen must be presented by autologous APCs and processed by HLA-I transfected target cells.
Provides rich annotation of candidate peptides
New research shows that neoantigens can be discovered by analyzing sequences from circulating tumor cells. By identifying candidate peptides with a lower false positive rate, Neopepsee could improve immunogenicity of tumor cells. The method also allows researchers to focus the search for viable neoantigens to specific regions of the genome encoding specific proteins. These methods have the potential to advance research in predictive biomarkers, personalized cancer vaccines, and next-generation immunotherapy.
The dbPepNeo2.0 database contains a wealth of neoantigen data and neoantigen predictions. Its powerful search options include cancer, gene, mut peptide, HLA allele, and LC immunopeptidomes. Its melanoma search option is particularly useful as it offers neoantigen prediction tools for class I and II neoantigens.
In addition to providing rich annotation of candidate peptides, Neopepsee also includes a database of Chinese peptides. This database is the first of its kind to have an extensive Chinese-language version of candidate peptides. Its comprehensive collection of peptide sequences from multiple sources is useful in a wide range of applications, including drug discovery. These databases also feature information on gene expression, protein structure, and a host of other functional properties.
The database includes detailed information on 87 immunogenicity-related values. It also includes a three-level probability of neoantigen recognition. In comparison with validated datasets and conventional methods, Neopepsee improved the performance of many existing cancer immunology research tools. Neopepsee also reveals protein sequence similarity. For example, in melanoma and chronic lymphocytic leukemia, Neopepsee has significantly improved the results of neoantigen prediction.
dbPepNeo2.0 integrates BLASTdb, a powerful tool for immunogenicity prediction. BLASTdb searches peptides from dbPepNeo2.0 using local alignments. By examining sequence similarities, dbPepNeo2.0 provides rich annotation of candidate peptides for cancer immunotherapy research.
The database contains a comprehensive catalog of validated neoantigen peptides. Neopepsee also includes an extensive list of HLA peptidomes. Each of these databases contains high-confidence peptides validated by TCR recognition, while MC peptides have intermediate confidence. LC immunopeptidomes are raw peptides bound by HLA molecules.
Predicts 7 neoantigens per patient
The cancer immunogenicity of a tumor is a potential biomarker for selecting patients for immune therapy. If tumors have immunogenic mutations, these predicted neoantigens can be used to identify patients who will benefit from checkpoint blockade or related therapies. These neoantigens are predicted by a multimodal approach that combines data from somatic variants, RNA expression, and resultant epitopes with host HLA type.
The computational process of identifying candidate mutant peptides based on DNA and RNA sequences requires several steps and complex analytical pipelines. RNA sequencing enables the detection of variant expression and matched tumor-normal sequencing data. The computational process involved in the identification of candidate neoantigens involves multiple steps, including somatic mutation identification, peptide processing, and peptide-MHC binding prediction.
The neoantigen prediction pipelines provide a summary of essential information, including the genomic coordinates, ClinGen allele registry ID, and Human Genome Variation Society variant name. The pipeline also provides an indication of the position of the variant within the candidate neoantigen peptide, binding affinity predictions, RNA variant allele frequency, and gene expression values.