Catalogue Search | MBRL

Tightening the (neural) net for protein structure prediction

by Bromberg Yana in Bioinformatics , Genomes , Learning algorithms

2022

In this Journal Club article, Yana Bromberg discusses an early application of machine learning for protein structure prediction — a paper that shaped her career. It illustrates the value of ensuring that machine learning approaches are rooted in known biological principles.

Journal Article

Share this book

Add to My Shelf

Chapter 15: Disease Gene Prioritization

by Bromberg, Yana in Animals , Biology , Computational biology

2013

Disease-causing aberrations in the normal function of a gene define that gene as a disease gene. Proving a causal link between a gene and a disease experimentally is expensive and time-consuming. Comprehensive prioritization of candidate genes prior to experimental testing drastically reduces the associated costs. Computational gene prioritization is based on various pieces of correlative evidence that associate each gene with the given disease and suggest possible causal links. A fair amount of this evidence comes from high-throughput experimentation. Thus, well-developed methods are necessary to reliably deal with the quantity of information at hand. Existing gene prioritization techniques already significantly improve the outcomes of targeted experimental studies. Faster and more reliable techniques that account for novel data types are necessary for the development of new diagnostics, treatments, and cure for many diseases.

Journal Article

Share this book

Add to My Shelf

Better prediction of functional effects for sequence variants

by Rost, Burkhard , Bromberg, Yana , Hecht, Maximilian in Animal Genetics and Genomics , Biomedical and Life Sciences , Computational Biology

2015

Elucidating the effects of naturally occurring genetic variation is one of the major challenges for personalized health and personalized medicine. Here, we introduce SNAP2, a novel neural network based classifier that improves over the state-of-the-art in distinguishing between effect and neutral variants. Our method's improved performance results from screening many potentially relevant protein features and from refining our development data sets. Cross-validated on >100k experimentally annotated variants, SNAP2 significantly outperformed other methods, attaining a two-state accuracy (effect/neutral) of 83%. SNAP2 also outperformed combinations of other methods. Performance increased for human variants but much more so for other organisms. Our method's carefully calibrated reliability index informs selection of variants for experimental follow up, with the most strongly predicted half of all effect variants predicted at over 96% accuracy. As expected, the evolutionary information from automatically generated multiple sequence alignments gave the strongest signal for the prediction. However, we also optimized our new method to perform surprisingly well even without alignments. This feature reduces prediction runtime by over two orders of magnitude, enables cross-genome comparisons, and renders our new method as the best solution for the 10-20% of sequence orphans. SNAP2 is available at: https://rostlab.org/services/snap2web Definitions used Delta, input feature that results from computing the difference feature scores for native amino acid and feature scores for variant amino acid; nsSNP, non-synoymous SNP; PMD, Protein Mutant Database; SNAP, Screening for non-acceptable polymorphisms; SNP, single nucleotide polymorphism; variant, any amino acid changing sequence variant.

Journal Article

Share this book

Add to My Shelf

Amino Acid Encoding for Deep Learning Applications

by Bromberg, Yana , Lenz, Tobias , Wendorff, Mareike in Algorithms , Amino acid encoding , Amino acids

2020

Background: The number of applications of deep learning algorithms in bioinformatics is increasing as they usually achieve superior performance over classical approaches, especially, when bigger training datasets are available. In deep learning applications, discrete data, e.g. words or n-grams in language, or amino acids or nucleotides in bioinformatics, are generally represented as a continuous vector through an embedding matrix. Recently, learning this embedding matrix directly from the data as part of the continuous iteration of the model to optimize the target prediction – a process called ‘end-to-end learning’ – has led to state-of-the-art results in many fields. Although usage of embeddings is well described in the bioinformatics literature, the potential of end-to-end learning for single amino acids, as compared to more classical manually-curated encoding strategies, has not been systematically addressed. To this end, we compared classical encoding matrices, namely one-hot, VHSE8 and BLOSUM62, to end-to-end learning of amino acid embeddings for two different prediction tasks using three widely used architectures, namely recurrent neural networks (RNN), convolutional neural networks (CNN), and the hybrid CNN-RNN. Results: By using different deep learning architectures, we show that end-to-end learning is on par with classical encodings for embeddings of the same dimension even when limited training data is available, and might allow for a reduction in the embedding dimension without performance loss, which is critical when deploying the models to devices with limited computational capacities. We found that the embedding dimension is a major factor in controlling the model performance. Surprisingly, we observed that deep learning models are capable of learning from random vectors of appropriate dimension. Conclusion: Our study shows that end-to-end learning is a flexible and powerful method for amino acid encoding. Further, due to the flexibility of deep learning systems, amino acid encoding schemes should be benchmarked against random vectors of the same dimension to disentangle the information content provided by the encoding scheme from the distinguishability effect provided by the scheme.

Journal Article

Share this book

Add to My Shelf

Correlating protein function and stability through the analysis of single amino acid substitutions

by Rost, Burkhard , Bromberg, Yana in Algorithms , Amino Acid Substitution , Amino acids

2009

Background Mutations resulting in the disruption of protein function are the underlying causes of many genetic diseases. Some mutations affect the number of expressed proteins while others alter the activity on a per-molecule basis. Single amino acid substitutions as caused by non-synonymous Single Nucleotide Polymorphisms (nsSNPs) often disrupt function by altering protein structure and/or stability, but can also wreak havoc by directly impacting functional binding sites. Given the experimental three-dimensional (3D) structure of a protein, we can try to differentiate between the \"effect on structure/stability\" and the \"effect on binding\". However, experimental 3D structures are available for only 1% of all known proteins; the magnitude of stability change caused by a given mutation is more widely available. Results Here, we analyze to which extent the functional effect of a mutation can be predicted from the effect on protein stability. We find that simple sequence-based methods succeed in predicting functional effects of nsSNPs. In fact, such methods consistently outperform approaches that predict functional change through the application of binary thresholds to stability change. We also observed that if stability is affected, functional change is easier to predict than when stability is not affected. Conclusion Our results confirmed that stability change is somehow related to function change. However, we also show that the knowledge of stability changes in no way suffices to predict functional changes and that many function changing mutations have no effect on stability.

Journal Article

Share this book

Add to My Shelf

Computational prediction shines light on type III secretion origins

by Rost, Burkhard , Goldberg, Tatyana , Bromberg, Yana in 631/114/2410 , 631/326/2565/2142 , Amino acid sequence

2016

Type III secretion system is a key bacterial symbiosis and pathogenicity mechanism responsible for a variety of infectious diseases, ranging from food-borne illnesses to the bubonic plague. In many Gram-negative bacteria, the type III secretion system transports effector proteins into host cells, converting resources to bacterial advantage. Here we introduce a computational method that identifies type III effectors by combining homology-based inference with de novo predictions, reaching up to 3-fold higher performance than existing tools. Our work reveals that signals for recognition and transport of effectors are distributed over the entire protein sequence instead of being confined to the N-terminus, as was previously thought. Our scan of hundreds of prokaryotic genomes identified previously unknown effectors, suggesting that type III secretion may have evolved prior to the archaea/bacteria split. Crucially, our method performs well for short sequence fragments, facilitating evaluation of microbial communities and rapid identification of bacterial pathogenicity – no genome assembly required. pEffect and its data sets are available at http://services.bromberglab.org/peffect .

Journal Article

Share this book

Add to My Shelf

Evolutionary history of redox metal-binding domains across the tree of life

by Bhattacharya, Debashish , Bromberg, Yana , Falkowski, Paul G. in active sites , Adrenodoxin - chemistry , Adrenodoxin - metabolism

2014

Oxidoreductases mediate electron transfer (i.e., redox) reactions across the tree of life and ultimately facilitate the biologically driven fluxes of hydrogen, carbon, nitrogen, oxygen, and sulfur on Earth. The core enzymes responsible for these reactions are ancient, often small in size, and highly diverse in amino acid sequence, and many require specific transition metals in their active sites. Here we reconstruct the evolution of metal-binding domains in extant oxidoreductases using a flexible network approach and permissive profile alignments based on available microbial genome data. Our results suggest there were at least 10 independent origins of redox domain families. However, we also identified multiple ancient connections between Fe ₂S ₂- (adrenodoxin-like) and heme- (cytochrome c) binding domains. Our results suggest that these two iron-containing redox families had a single common ancestor that underwent duplication and divergence. The iron-containing protein family constitutes ∼50% of all metal-containing oxidoreductases and potentially catalyzed redox reactions in the Archean oceans. Heme-binding domains seem to be derived via modular evolutionary processes that ultimately form the backbone of redox reactions in both anaerobic and aerobic respiration and photosynthesis. The empirically discovered network allows us to peer into the ancient history of microbial metabolism on our planet.

Journal Article

Share this book

Add to My Shelf

VarI-COSI 2018: a forum for research advances in variant interpretation and diagnostics

by Bromberg, Yana , Capriotti, Emidio , Carter, Hannah in Animal Genetics and Genomics , Biomedical and Life Sciences , Control

2019

Journal Article

Share this book

Add to My Shelf

Ten simple rules for drawing scientific comics

by Bromberg, Yana , Partridge, Matthew , McDermott, Jason E. in Audiences , Bioinformatics , Biology and Life Sciences

2018

Institutions around the world are fighting to improve science communication all the time. From calls for journal papers to be simplified to encouraging scientists to take more of an active role through community engagement, there is an impetus to demystify and improve public understanding and engagement with science. Technology has greatly helped expand the range of learning styles that a lecturer can call on to reach people in new ways. Social media outlets like Twitter, Facebook, Instagram, and Tumblr have expanded the reach of science communication within and across scientific disciplines and to the lay public. Here, with all the videos, interactive quizzes, and instant feedback it can be easy to overlook one of the simplest methods for communicating complex ideas: comics.

Journal Article

Share this book

Add to My Shelf

Human Papillomavirus, Human Immunodeficiency Virus, and Oral Microbiota Interplay in Nigerian Youth (HOMINY): A Prospective Cohort Study Protocol

by Idemudia, Nosakhare , Obuekwe, Ozoemene , Bromberg, Yana in Adolescent , Antiretroviral drugs , Cervical cancer

2025

IntroductionPersistent oral infections with high-risk human papillomavirus (HR-HPV) are a potential cause of most oropharyngeal cancers (OPCs). Oral HR-HPV infection and persistence are significantly higher in people living with HIV (PLWH). Most data on oral HR-HPV in PLWH come from developed countries or adult cohorts. This study aims to investigate oral HR-HPV susceptibility and persistence among children and adolescents living with HIV (CALHIV) and to understand the roles of perinatal HIV exposure, infection, antiretroviral treatment, and the oral microbiome.Methods and analysisThis prospective cohort study is ongoing at the University of Benin Teaching Hospital (UBTH), Nigeria, involving mother-child pairs followed at 6-month intervals for 2 years. Participants include children aged 9–18 and their mothers aged 18 and above. The study targets 690 adolescents in three groups: 230 CALHIV, 230 HIV-exposed but uninfected and 230 HIV-unexposed and uninfected. Oral rinse, saliva, buccal swabs and supragingival plaque samples are collected at each visit. Blood samples are tested for HIV, Hepatitis B virus (HBV) and Hepatitis C virus (HCV), with CD4, CD8 and full blood counts performed. Oral HPV is assessed for incidence, persistence, and clearance. Statistical analyses to look for associations between cohort baseline characteristics and findings will be conducted using univariable and multivariable models for repeated data and high-dimensional microbiome data. All statistical tests will be two-sided; a p value <0.05 will indicate significance. Multiple comparisons will be adjusted using the False Discovery Rate (FDR) correction to control for Type I error.Ethics and disseminationThe study was approved by Rutgers State University (Pro2022000949) and the UBTH (ADM/E22/A/VOL. VII/14813674). Informed consent was obtained from all parents/guardians.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter