Catalogue Search | MBRL

Highly accurate protein structure prediction for the human proteome

by Nikolov, Stanislav , Senior, Andrew W. , Zielinski, Michal in 631/114/1305 , 631/114/2411 , 631/1647/2067

2021

Protein structures can provide invaluable information, both for reasoning about biological processes and for enabling interventions such as structure-based drug development or targeted mutagenesis. After decades of effort, 17% of the total residues in human protein sequences are covered by an experimentally determined structure 1 . Here we markedly expand the structural coverage of the proteome by applying the state-of-the-art machine learning method, AlphaFold 2 , at a scale that covers almost the entire human proteome (98.5% of human proteins). The resulting dataset covers 58% of residues with a confident prediction, of which a subset (36% of all residues) have very high confidence. We introduce several metrics developed by building on the AlphaFold model and use them to interpret the dataset, identifying strong multi-domain predictions as well as regions that are likely to be disordered. Finally, we provide some case studies to illustrate how high-quality predictions could be used to generate biological hypotheses. We are making our predictions freely available to the community and anticipate that routine large-scale and high-accuracy structure prediction will become an important tool that will allow new questions to be addressed from a structural perspective. AlphaFold is used to predict the structures of almost all of the proteins in the human proteome—the availability of high-confidence predicted structures could enable new avenues of investigation from a structural perspective.

Journal Article

Share this book

Add to My Shelf

Deep Learning of Sea Surface Temperature Patterns to Identify Ocean Extremes

by Cornillon, Peter C. , Prochaska, J. Xavier , Reiman, David M. in algorithms , color , data collection

2021

We performed an out-of-distribution (OOD) analysis of ∼12,000,000 semi-independent 128 × 128 pixel2 sea surface temperature (SST) regions, which we define as cutouts, from all nighttime granules in the MODIS R2019 Level-2 public dataset to discover the most complex or extreme phenomena at the ocean’s surface. Our algorithm (ULMO) is a probabilistic autoencoder (PAE), which combines two deep learning modules: (1) an autoencoder, trained on ∼150,000 random cutouts from 2010, to represent any input cutout with a 512-dimensional latent vector akin to a (non-linear) Empirical Orthogonal Function (EOF) analysis; and (2) a normalizing flow, which maps the autoencoder’s latent space distribution onto an isotropic Gaussian manifold. From the latter, we calculated a log-likelihood (LL) value for each cutout and defined outlier cutouts to be those in the lowest 0.1% of the distribution. These exhibit large gradients and patterns characteristic of a highly dynamic ocean surface, and many are located within larger complexes whose unique dynamics warrant future analysis. Without guidance, ULMO consistently locates the outliers where the major western boundary currents separate from the continental margin. Prompted by these results, we began the process of exploring the fundamental patterns learned by ULMO thereby identifying several compelling examples. Future work may find that algorithms such as ULMO hold significant potential/promise to learn and derive other, not-yet-identified behaviors in the ocean from the many archives of satellite-derived SST fields. We see no impediment to applying them to other large remote-sensing datasets for ocean science (e.g., SSH and ocean color).

Journal Article

Share this book

Add to My Shelf

Accurate structure prediction of biomolecular interactions with AlphaFold 3

by O’Neill, Michael , Low, Caroline M. R. , Zielinski, Michal in 631/114/1305 , 631/114/2411 , 631/154

2024

The introduction of AlphaFold 2 1 has spurred a revolution in modelling the structure of proteins and their interactions, enabling a huge range of applications in protein modelling and design 2 , 3 , 4 , 5 – 6 . Here we describe our AlphaFold 3 model with a substantially updated diffusion-based architecture that is capable of predicting the joint structure of complexes including proteins, nucleic acids, small molecules, ions and modified residues. The new AlphaFold model demonstrates substantially improved accuracy over many previous specialized tools: far greater accuracy for protein–ligand interactions compared with state-of-the-art docking tools, much higher accuracy for protein–nucleic acid interactions compared with nucleic-acid-specific predictors and substantially higher antibody–antigen prediction accuracy compared with AlphaFold-Multimer v.2.3 7 , 8 . Together, these results show that high-accuracy modelling across biomolecular space is possible within a single unified deep-learning framework. AlphaFold 3 has a substantially updated architecture that is capable of predicting the joint structure of complexes including proteins, nucleic acids, small molecules, ions and modified residues with greatly improved accuracy over many previous specialized tools.

Journal Article

Share this book

Add to My Shelf

Highly accurate protein structure prediction with AlphaFold

by Nikolov, Stanislav , Senior, Andrew W. , Zielinski, Michal in 631/114/1305 , 631/114/2411 , 631/535

2021

Proteins are essential to life, and understanding their structure can facilitate a mechanistic understanding of their function. Through an enormous experimental effort 1 , 2 , 3 – 4 , the structures of around 100,000 unique proteins have been determined 5 , but this represents a small fraction of the billions of known protein sequences 6 , 7 . Structural coverage is bottlenecked by the months to years of painstaking effort required to determine a single protein structure. Accurate computational approaches are needed to address this gap and to enable large-scale structural bioinformatics. Predicting the three-dimensional structure that a protein will adopt based solely on its amino acid sequence—the structure prediction component of the ‘protein folding problem’ 8 —has been an important open research problem for more than 50 years 9 . Despite recent progress 10 , 11 , 12 , 13 – 14 , existing methods fall far short of atomic accuracy, especially when no homologous structure is available. Here we provide the first computational method that can regularly predict protein structures with atomic accuracy even in cases in which no similar structure is known. We validated an entirely redesigned version of our neural network-based model, AlphaFold, in the challenging 14th Critical Assessment of protein Structure Prediction (CASP14) 15 , demonstrating accuracy competitive with experimental structures in a majority of cases and greatly outperforming other methods. Underpinning the latest version of AlphaFold is a novel machine learning approach that incorporates physical and biological knowledge about protein structure, leveraging multi-sequence alignments, into the design of the deep learning algorithm. AlphaFold predicts protein structures with an accuracy competitive with experimental structures in the majority of cases using a novel deep learning architecture.

Journal Article

Share this book

Add to My Shelf

Addendum: Accurate structure prediction of biomolecular interactions with AlphaFold 3

by O’Neill, Michael , Low, Caroline M. R. , Zielinski, Michal in 631/114/1305 , 631/114/2411 , 631/154

2024

Journal Article

Share this book

Add to My Shelf

Development of the Human Infant Intestinal Microbiota

by Bik, Elisabeth M , Palmer, Chana , Brown, Patrick O in Adult , Age Factors , Bacteria

2007

Almost immediately after a human being is born, so too is a new microbial ecosystem, one that resides in that person's gastrointestinal tract. Although it is a universal and integral part of human biology, the temporal progression of this process, the sources of the microbes that make up the ecosystem, how and why it varies from one infant to another, and how the composition of this ecosystem influences human physiology, development, and disease are still poorly understood. As a step toward systematically investigating these questions, we designed a microarray to detect and quantitate the small subunit ribosomal RNA (SSU rRNA) gene sequences of most currently recognized species and taxonomic groups of bacteria. We used this microarray, along with sequencing of cloned libraries of PCR-amplified SSU rDNA, to profile the microbial communities in an average of 26 stool samples each from 14 healthy, full-term human infants, including a pair of dizygotic twins, beginning with the first stool after birth and continuing at defined intervals throughout the first year of life. To investigate possible origins of the infant microbiota, we also profiled vaginal and milk samples from most of the mothers, and stool samples from all of the mothers, most of the fathers, and two siblings. The composition and temporal patterns of the microbial communities varied widely from baby to baby. Despite considerable temporal variation, the distinct features of each baby's microbial community were recognizable for intervals of weeks to months. The strikingly parallel temporal patterns of the twins suggested that incidental environmental exposures play a major role in determining the distinctive characteristics of the microbial community in each baby. By the end of the first year of life, the idiosyncratic microbial ecosystems in each baby, although still distinct, had converged toward a profile characteristic of the adult gastrointestinal tract.

Journal Article

Share this book

Add to My Shelf

Neural Probabilistic Modeling for Astrophysics and Galaxy Evolution

by Reiman, David in Astronomy , Astrophysics , Computer science

2020

Statistical modeling in modern astrophysics and cosmology frequently involves simplified analytic models which fail to capture the underlying complexity of the problem at hand. And though traditional machine learning methods have proved useful, they scale poorly with dataset size and will struggle to make optimal use of the vast quantities of data soon to be produced by near-future surveys. Recently, neural networks have provided a powerful new foothold for probabilistic modeling in the presence of large data volumes by acting as expressive universal function approximators. In this thesis, I consider two applications of neural networks: (i) I use a convolutional neural network with an adversarial regularizing loss to deblend superimposed galaxy images. I focus on two-component blends and show that a model trained with a combination of supervised pixel-wise and regularizing adversarial losses provides high fidelity deblended images. And (ii) I use normalizing flows, a neural density estimation technique, to model the distribution over intrinsic quasar continua near Lyman-alpha given the redward spectrum. I then constrain the timeline of the Epoch of Reionization by measuring the neutral fraction of hydrogen in the spectrum of two z>7 quasars. I also describe ongoing work on recovering the local density field in the vicinity of distant galaxies with attention-based graph neural networks and likelihood free inference for astrophysical simulator models with intractable likelihoods.

Dissertation

Share this book

Add to My Shelf

Understanding health disparities

by Moufarrej, Mira N , Aghaeepour, Nima , Druzin, Maurice L in Birth , Childbirth & labor , Children

2019

Based upon our recent insights into the determinants of preterm birth, which is the leading cause of death in children under five years of age worldwide, we describe potential analytic frameworks that provides both a common understanding and, ultimately the basis for effective, ameliorative action. Our research on preterm birth serves as an example that the framing of any human health condition is a result of complex interactions between the genome and the exposome. New discoveries of the basic biology of pregnancy, such as the complex immunological and signaling processes that dictate the health and length of gestation, have revealed a complexity in the interactions (current and ancestral) between genetic and environmental forces. Understanding of these relationships may help reduce disparities in preterm birth and guide productive research endeavors and ultimately, effective clinical and public health interventions.

Journal Article

Share this book

Add to My Shelf

Adaptations of Avian Flu Virus Are a Cause for Concern

by Cohen, Murray L. , Enquist, Lynn W. , Lumpkin, John R. in Advisory boards , Avian flu , Biological and medical sciences

2012

Members of the National Science Advisory Board for Biosecurity explain its recommendations on the communication of experimental work on H5N1 influenza. We are in the midst of a revolutionary period in the life sciences. Technological capabilities have dramatically expanded, we have a much improved understanding of the complex biology of selected microorganisms, and we have a much improved ability to manipulate microbial genomes. With this has come unprecedented potential for better control of infectious diseases and significant societal benefit. However, there is also a growing risk that the same science will be deliberately misused and that the consequences could be catastrophic. Efforts to describe or define life-sciences research of particular concern have focused on the possibility that knowledge or products derived from such research, or new technologies, could be directly misapplied with a sufficiently broad scope to affect national or global security. Research that might greatly enhance the harm caused by microbial pathogens has been of special concern ( 1 – 3 ). Until now, these efforts have suffered from a lack of specificity and a paucity of concrete examples of “dual use research of concern” ( 3 ). Dual use is defined as research that could be used for good or bad purposes. We are now confronted by a potent, real-world example.

Journal Article

Share this book

Add to My Shelf

A novel exon 2 I27V VCP variant is associated with dissimilar clinical syndromes

by Mead, Simon , Rohrer, Jonathan D. , Reiman, David in Adenosine Triphosphatases - genetics , Aged , Algorithms

2011

Mutations in valosin-containing protein (VCP) are associated with a syndromic constellation of inclusion body myositis, Paget’s disease of bone and frontotemporal dementia. Here we describe the case reports of two patients with a novel variation (p.I27V) in the VCP gene that was not identified in a healthy control population. One patient presented with a frontotemporal dementia syndrome associated with raised serum alkaline phosphatase and a family history of progressive muscle disease and behavioural decline, while the second patient presented with isolated progressive dysarthria. Together these cases suggest a potential for the same VCP mutation to produce distinct patterns of brain damage, underlining the clinical heterogeneity of VCP-associated disease.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter