Catalogue Search | MBRL

Amino Acid Encoding for Deep Learning Applications

by Bromberg, Yana , Lenz, Tobias , Wendorff, Mareike in Algorithms , Amino acid encoding , Amino acids

2020

Background: The number of applications of deep learning algorithms in bioinformatics is increasing as they usually achieve superior performance over classical approaches, especially, when bigger training datasets are available. In deep learning applications, discrete data, e.g. words or n-grams in language, or amino acids or nucleotides in bioinformatics, are generally represented as a continuous vector through an embedding matrix. Recently, learning this embedding matrix directly from the data as part of the continuous iteration of the model to optimize the target prediction – a process called ‘end-to-end learning’ – has led to state-of-the-art results in many fields. Although usage of embeddings is well described in the bioinformatics literature, the potential of end-to-end learning for single amino acids, as compared to more classical manually-curated encoding strategies, has not been systematically addressed. To this end, we compared classical encoding matrices, namely one-hot, VHSE8 and BLOSUM62, to end-to-end learning of amino acid embeddings for two different prediction tasks using three widely used architectures, namely recurrent neural networks (RNN), convolutional neural networks (CNN), and the hybrid CNN-RNN. Results: By using different deep learning architectures, we show that end-to-end learning is on par with classical encodings for embeddings of the same dimension even when limited training data is available, and might allow for a reduction in the embedding dimension without performance loss, which is critical when deploying the models to devices with limited computational capacities. We found that the embedding dimension is a major factor in controlling the model performance. Surprisingly, we observed that deep learning models are capable of learning from random vectors of appropriate dimension. Conclusion: Our study shows that end-to-end learning is a flexible and powerful method for amino acid encoding. Further, due to the flexibility of deep learning systems, amino acid encoding schemes should be benchmarked against random vectors of the same dimension to disentangle the information content provided by the encoding scheme from the distinguishability effect provided by the scheme.

Journal Article

Share this book

Add to My Shelf

Immunopeptidomics toolkit library (IPTK): a python-based modular toolbox for analyzing immunopeptidomics data

by Degenhardt, Frauke , Bacher, Petra , Wendorff, Mareike in Adaptive systems , Algorithms , Analysis

2021

Background The human leukocyte antigen (HLA) proteins play a fundamental role in the adaptive immune system as they present peptides to T cells. Mass-spectrometry-based immunopeptidomics is a promising and powerful tool for characterizing the immunopeptidomic landscape of HLA proteins, that is the peptides presented on HLA proteins. Despite the growing interest in the technology, and the recent rise of immunopeptidomics-specific identification pipelines, there is still a gap in data-analysis and software tools that are specialized in analyzing and visualizing immunopeptidomics data. Results We present the IPTK library which is an open-source Python-based library for analyzing, visualizing, comparing, and integrating different omics layers with the identified peptides for an in-depth characterization of the immunopeptidome. Using different datasets, we illustrate the ability of the library to enrich the result of the identified peptidomes. Also, we demonstrate the utility of the library in developing other software and tools by developing an easy-to-use dashboard that can be used for the interactive analysis of the results. Conclusion IPTK provides a modular and extendable framework for analyzing and integrating immunopeptidomes with different omics layers. The library is deployed into PyPI at https://pypi.org/project/IPTKL/ and into Bioconda at https://anaconda.org/bioconda/iptkl , while the source code of the library and the dashboard, along with the online tutorials are available at https://github.com/ikmb/iptoolkit .

Journal Article

Share this book

Add to My Shelf

Unbiased Characterization of Peptide-HLA Class II Interactions Based on Large-Scale Peptide Microarrays; Assessment of the Impact on HLA Class II Ligand and Epitope Prediction

by Degenhardt, Frauke , Wendorff, Mareike , Østerbye, Thomas in Algorithms , Amino acids , Antigen Presentation

2020

Human Leukocyte Antigen class II (HLA-II) molecules present peptides to T lymphocytes and play an important role in adaptive immune responses. Characterizing the binding specificity of single HLA-II molecules has profound impacts for understanding cellular immunity, identifying the cause of autoimmune diseases, for immunotherapeutics, and vaccine development. Here, novel high-density peptide microarray technology combined with machine learning techniques were used to address this task at an unprecedented level of high-throughput. Microarrays with over 200,000 defined peptides were assayed with four exemplary HLA-II molecules. Machine learning was applied to mine the signals. The comparison of identified binding motifs, and power for predicting eluted ligands and CD4+ epitope datasets to that obtained using NetMHCIIpan-3.2, confirmed a high quality of the chip readout. These results suggest that the proposed microarray technology offers a novel and unique platform for large-scale unbiased interrogation of peptide binding preferences of HLA-II molecules.

Journal Article

Share this book

Add to My Shelf

Genomewide Association Study of Severe Covid-19 with Respiratory Failure

by Blanco-Grau, Albert , Scudeller, Luigia , Pesenti, Antonio in ABO Blood-Group System - genetics , ABO system , Aged

2020

During the peak of hospitalizations of patients with severe Covid-19 in Italy and Spain in March, a group of researchers in these and other countries obtained and analyzed samples, resulting in the identification of two chromosomal loci associated with the disorder.

Journal Article

Share this book

Add to My Shelf

Autoantibody-negative insulin-dependent diabetes mellitus after SARS-CoV-2 infection: a case report

by Hollstein, Tim , Ziegler, Anette G. , Bonifacio, Ezio in 631/326/596/4130 , 631/443/319/1642/137/1418 , 692/163/2743/137/1418

2020

Here we report a case where the manifestations of insulin-dependent diabetes occurred following SARS-CoV-2 infection in a young individual in the absence of autoantibodies typical for type 1 diabetes mellitus. Specifically, a 19-year-old white male presented at our emergency department with diabetic ketoacidosis, C-peptide level of 0.62 µg l –1 , blood glucose concentration of 30.6 mmol l –1 (552 mg dl –1 ) and haemoglobin A1c of 16.8%. The patient´s case history revealed probable COVID-19 infection 5–7 weeks before admission, based on a positive test for antibodies against SARS-CoV-2 proteins as determined by enzyme-linked immunosorbent assay. Interestingly, the patient carried a human leukocyte antigen genotype (HLA DR1-DR3-DQ2) considered to provide only a slightly elevated risk of developing autoimmune type 1 diabetes mellitus. However, as noted, no serum autoantibodies were observed against islet cells, glutamic acid decarboxylase, tyrosine phosphatase, insulin and zinc-transporter 8. Although our report cannot fully establish causality between COVID-19 and the development of diabetes in this patient, considering that SARS-CoV-2 entry receptors, including angiotensin-converting enzyme 2, are expressed on pancreatic β-cells and, given the circumstances of this case, we suggest that SARS-CoV-2 infection, or COVID-19, might negatively affect pancreatic function, perhaps through direct cytolytic effects of the virus on β-cells. The authors report the case of a young patient who displayed insulin-dependent diabetes after SARS-CoV-2 infection in the absence of autoantibodies indicative of autoimmune type 1 diabetes.

Journal Article

Share this book

Add to My Shelf

A novel unconventional T cell population enriched in Crohn’s disease

by Hübenthal, Matthias , Bacher, Petra , Wendorff, Mareike in alpha beta T cells , Amino acids , Antigens

2022

ObjectiveOne of the current hypotheses to explain the proinflammatory immune response in IBD is a dysregulated T cell reaction to yet unknown intestinal antigens. As such, it may be possible to identify disease-associated T cell clonotypes by analysing the peripheral and intestinal T-cell receptor (TCR) repertoire of patients with IBD and controls.DesignWe performed bulk TCR repertoire profiling of both the TCR alpha and beta chains using high-throughput sequencing in peripheral blood samples of a total of 244 patients with IBD and healthy controls as well as from matched blood and intestinal tissue of 59 patients with IBD and disease controls. We further characterised specific T cell clonotypes via single-cell RNAseq.ResultsWe identified a group of clonotypes, characterised by semi-invariant TCR alpha chains, to be significantly enriched in the blood of patients with Crohn’s disease (CD) and particularly expanded in the CD8+ T cell population. Single-cell RNAseq data showed an innate-like phenotype of these cells, with a comparable gene expression to unconventional T cells such as mucosal associated invariant T and natural killer T (NKT) cells, but with distinct TCRs.ConclusionsWe identified and characterised a subpopulation of unconventional Crohn-associated invariant T (CAIT) cells. Multiple evidence suggests these cells to be part of the NKT type II population. The potential implications of this population for CD or a subset thereof remain to be elucidated, and the immunophenotype and antigen reactivity of CAIT cells need further investigations in future studies.

Journal Article

Share this book

Add to My Shelf

Amino Acid Encoding for Deep Learning Applications

by Bromberg, Yana , Lenz, Tobias , Wendorff, Mareike in Chemistry and Materials (General)

2020

Background: The number of applications of deep learning algorithms in bioinformatics is increasing as they usually achieve superior performance over classical approaches, especially, when bigger training datasets are available. In deep learning applications, discrete data, e.g. words or n-grams in language, or amino acids or nucleotides in bioinformatics, are generally represented as a continuous vector through an embedding matrix. Recently, learning this embedding matrix directly from the data as part of the continuous iteration of the model to optimize the target prediction – a process called ‘end-to-end learning’ – has led to state-ofthe-art results in many fields. Although usage of embeddings is well described in the bioinformatics literature, the potential of end-to-end learning for single amino acids, as compared to more classical manually-curated encoding strategies, has not been systematically addressed. To this end, we compared classical encoding matrices, namely one-hot, VHSE8 and BLOSUM62, to end-to-end learning of amino acid embeddings for two different prediction tasks using three widely used architectures, namely recurrent neural networks (RNN), convolutional neural networks (CNN), and the hybrid CNN-RNN. Results: By using different deep learning architectures, we show that end-to-end learning is on par with classical encodings for embeddings of the same dimension even when limited training data is available, and might allow for a reduction in the embedding dimension without performance loss, which is critical when deploying the models to devices with limited computational capacities. We found that the embedding dimension is a major factor in controlling the model performance. Surprisingly, we observed that deep learning models are capable of learning from random vectors of appropriate dimension. Conclusion: Our study shows that end-to-end learning is a flexible and powerful method for amino acid encoding. Further, due to the flexibility of deep learning systems, amino acid encoding schemes should be benchmarked against random vectors of the same dimension to disentangle the information content provided by the encoding scheme from the distinguishability effect provided by the scheme.

Journal Article

Share this book

Add to My Shelf

VCF2Prot: An Efficient and Parallel Tool for Generating Personalized Proteomes from VCF Files

by Franke, Andre , Degenhardt, Frauke , Wendorff, Mareike in Algorithms , Cancer vaccines , Genomics

2022

Motivation: The ability to generate sample-specific protein sequences is a crucial step in neo-antigen discovery, cancer vaccine development, and proteogenomics. The revolutionary increase in the throughput of sequencers has fueled large-scale genomic and transcriptomic studies, holding great promises for the emerging field of personalized medicine. However, most sequencing projects store their sequencing data in an abbreviated variant calling format (VCF) that is not immediately amenable to subsequent proteomic and peptidomic analyses. Furthermore, data processing of such increasingly massive genome-scale datasets calls for parallel and concurrent programming, and consequently refactoring of existing algorithms and/or the development of new parallel algorithms. Results: Here, we introduce sequence intermediate representation (SIR), a novel and generic algorithm for generating personalized or sample-specific protein sequences from a consequence-called VCF file and the corresponding reference proteome. An implementation of SIR, named VCF2Prot, was developed to aid personalized medicine and proteogenomics by generating personalized proteomes in FASTA format from a collection of consequence-called genomic alterations stored in a VCF file. Benchmarking VCF2Prot against the recently published PrecisionProDB showed an ~1000-fold improvement in runtime (depending on the input size). Furthermore, in a scale-up study VCF2Prot processed a VCF file containing 99,254 variants observed across 8,192 patients in ~ 11 minutes, demonstrating the massive improvement in the execution speed and the utility of SIR and VCF2prot in bridging large-scale genomic and proteomic studies. Availability and Implementation: VCF2Prot comes with a permissive MIT-license, enabling the commercial and non-commercial utilization of the tool. The source code along with precompiled versions for Linux/Mac OS are available at https://github.com/ikmb/vcf2prot. The modular units used for building VCF2Prot are available as a Rust crate at https://crates.io/crates/ppgg with documentations and examples at https://docs.rs/ppgg/0.1.4/ppgg/ under the same MIT-license. Competing Interest Statement The authors have declared no competing interest.

Paper

Share this book

Add to My Shelf

Predicting Peptide HLA-II Presentation Using Immunopeptidomics, Transcriptomics and Deep Multimodal Learning

by Degenhardt, Frauke , Bacher, Petra , Wendorff, Mareike in Antigen presentation , Bioinformatics , CD4 antigen

2022

The human leukocyte antigen (HLA) class II proteins present peptides to CD4+ T cells through an interaction with T cell receptors (TCRs). Thus, HLA proteins are key players in shaping immunogenicity and immunodominance. Nevertheless, factors governing peptide presentation by HLA-II proteins are still poorly understood. To address this problem, we profiled the blood transcriptome and immunopeptidome of 20 healthy individuals and integrated the profiles with publicly available immunopeptidomics datasets. In depth multi-omics analysis identified expression levels and subcellular locations as import sequence-independent features governing presentation. Levering this knowledge, we developed the Peptide Immune Annotator Multimodal (PIA-M) tool, as a novel pan multimodal transformer-based framework that utilises sequence-dependent along with sequence-independent features to model presentation by HLA-II proteins. PIA-M illustrated a consistently superior performance relative to existing tools across two independent test datasets (area under the curve: 0.93 vs. 0.84 and 0.95 vs. 0.86), respectively. Besides achieving a higher predictive accuracy, PIA-M with its Rust-based pre-processing engine, had significantly shorter runtimes. PIA-M is freely available with a permissive licence as a standalone pipeline and as a webserver (https://hybridcomputing.ikmb.uni-kiel.de/pia). In conclusion, PIA-M enables a new state-of-the-art accuracy in predicting peptide presentation by HLA-II proteins in vivo. Competing Interest Statement The authors have declared no competing interest. Footnotes * https://hybridcomputing.ikmb.uni-kiel.de/pia

Paper

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter