The coronavirus COVID-19 pandemic has been one of the most serious social and economic disruptions in the contemporary world and is still an ongoing concern. Starting with the outbreak in China at the end of 2019, the disease has rapidly spread to nearly every country across the globe. As of 14 September 2020, the virus has infected almost 30 millions of people and has led to the death of more than 925,000. To fight the pandemic back, the entire world has been brought to a standstill trying to reduce the transmission of the pathogen through preventive measures including social distancing, sanitizing and wearing face masks.
Our immunology team has released a publication addressing the design of an epitope-based vaccine inducing the cellular immune response against the novel coronavirus. Having in mind the biology behind the virus-infected cells, the study applies machine-learning techniques to predict SARS-CoV-2 epitopes that can trigger immune response. The proposed model is trained and tested on epitopes from publicly-available experimental data on both presentation and immunogenicity.
The final predictions are combined using outputs of the presentation and immunogenicity models. Our presentation model was trained on curated datasets incorporating peptides presented on the cell surface via HLA class I molecules. All of the positive examples were collected with the use of mass-spectrometry experiments performed on human mono-allelic cell lines. On the other hand, synthetic negative data (non-presented peptides) were selected among expressed proteins from the human reference proteome. The presentation model was also used as a starting point for fine-tuning the immunogenicity model on a curated set of immunogenicity data downloaded from IEDB. For both models, the HLA allele and peptide sequence were considered as separate inputs, only canonical HLA class I types were considered, and the analysis was restricted to the peptides of 8-11 amino acids in length.
The aim of the study is to design a peptide vaccine composition for stimulating cellular immune response by boosting the already-existing inner machinery for the removal of the coronavirus-infected cells. Such effects could be induced via administering a vaccine that elicits defensive CD8+ Cytotoxic T Lymphocyte (CTL) or transferring the CD8+ cells that have been designed in a manner that allows the recognition of specific viral antigens. We identify the immunogenic SARS-CoV-2 epitopes to propose the vaccine composition that is targeted to T-cells activation. Some biological aspects were explored: the possibility of viral epitopes to induce effective T-cell response, the immune tolerance and potential toxicity of the peptides, the variability of the SARS-CoV-2 genome, as well as the population coverage of HLA alleles.
Within the study, the high variety of the most common HLA alleles was included in order to obtain the largest possible population coverage while using a restricted number of epitopes, accounting for the limitations of the loading capacity often encountered in vaccine design. It has also been shown that the results of our model are more highly correlated with the experimentally measured peptide-HLA complex (pHLA) stability in comparison with other state-of-the-art tools depending solely on the predictions of binding affinity or ligand likelihood.
Our formula for vaccine composition includes T-cell epitopes originating from SARS-CoV-2 proteome, which comprises both the structural and the non-structural proteins. The majority of the selected peptides originate from conserved SARS-Cov-2 genome regions. The availability of epitopes from the non-structural proteins enables not only for additional induction of T-cell based response, but it may also be used in combination with standard B-cell reaction boosting components to create even better performing vaccines. What is more, the adoption of such twin-track approach could reduce the risk of non-neutralizing antibodies production, which was the main concern during the development of vaccines against MERS-CoV and SARS-CoV. In addition, at early phase of viral infection the expression of the non-structural proteins is considerably higher than that of the structural ones and thus the immune response stimulation towards such peptides could also benefit at the formative stages of the disease.
Novel coronavirus studies revealed the high selective pressure and genetic variability observed in SARS-CoV-2, both responsible for the accelerated evolution and high mutational rate of the virus. Moreover both aspects could significantly limit the efficacy of vaccines being currently under development. The widespread use of the vaccine can be also delayed because of safety issues associated with its approval and commercialization. This is possible only upon positive evaluation of Phase 3 Clinical Trials. While cross-reactivity must be carefully considered in vaccine design, some of the viral epitopes can naturally occur in the protein of the host. Thus they can be either tolerated by the host’s immune system or, even worse, trigger the activation of an auto-immune response against normal proteins. Peptides with such characteristics were excluded from further analysis.
Similarly to other in-silico approaches, our method has some limitations associated with the exiguous amount of available training data which might affect the overall predictive performance of the model. However, it is clear that the further advancements of machine learning algorithms as well as the growing number of experimentally validated samples, will foster the improvement of epitope-based vaccine design strategies. Reduced costs, general availability and improved experimental procedures will significantly accelerate the evolution of this field. Regardless of the considerable room for improvement, the solutions already provided are of high value and represent a tangible contribution to the fight against COVID-19.
AI aided design of epitope-based vaccine for the induction of cellular immune responses against SARS CoV-2 G. Mazzocco, I. Niemiec, A. Myronov, P. Skoczylas, J. Kaczmarczyk, A. Sanecka-Duin, K. Gruba, P. Król, M. Drwal, M. Szczepanik, K. Pyrc, P. Stępniak, bioRxiv 2020.08.26.267997; https://www.biorxiv.org/content/10.1101/2020.08.26.267997v1.article-metrics