Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
3,975
result(s) for
"structural bioinformatics"
Sort by:
3D deep convolutional neural networks for amino acid environment similarity analysis
2017
Background
Central to protein biology is the understanding of how structural elements give rise to observed function. The surfeit of protein structural data enables development of computational methods to systematically derive rules governing structural-functional relationships. However, performance of these methods depends critically on the choice of protein structural representation. Most current methods rely on features that are manually selected based on knowledge about protein structures. These are often general-purpose but not optimized for the specific application of interest.
In this paper, we present a general framework that applies 3D convolutional neural network (3DCNN) technology to structure-based protein analysis. The framework automatically extracts task-specific features from the raw atom distribution, driven by supervised labels. As a pilot study, we use our network to analyze local protein microenvironments surrounding the 20 amino acids, and predict the amino acids most compatible with environments within a protein structure. To further validate the power of our method, we construct two amino acid substitution matrices from the prediction statistics and use them to predict effects of mutations in T4 lysozyme structures.
Results
Our deep 3DCNN achieves a two-fold increase in prediction accuracy compared to models that employ conventional hand-engineered features and successfully recapitulates known information about similar and different microenvironments. Models built from our predictions and substitution matrices achieve an 85% accuracy predicting outcomes of the T4 lysozyme mutation variants. Our substitution matrices contain rich information relevant to mutation analysis compared to well-established substitution matrices. Finally, we present a visualization method to inspect the individual contributions of each atom to the classification decisions.
Conclusions
End-to-end trained deep learning networks consistently outperform methods using hand-engineered features, suggesting that the 3DCNN framework is well suited for analysis of protein microenvironments and may be useful for other protein structural analyses.
Journal Article
Defining a new nomenclature for the structures of active and inactive kinases
by
Dunbrack, Roland L.
,
Modi, Vivek
in
Algorithms
,
Biological Sciences
,
Biophysics and Computational Biology
2019
Targeting protein kinases is an important strategy for intervention in cancer. Inhibitors are directed at the active conformation or a variety of inactive conformations. While attempts have been made to classify these conformations, a structurally rigorous catalog of states has not been achieved. The kinase activation loop is crucial for catalysis and begins with the conserved DFGmotif. This motif is observed in two major classes of conformations, DFGin—a set of active and inactive conformations where the Phe residue is in contact with the C-helix of the N-terminal lobe—and DFGout—an inactive form where Phe occupies the ATP site exposing the C-helix pocket. We have developed a clustering of kinase conformations based on the location of the Phe side chain (DFGin, DFGout, and DFGinter or intermediate) and the backbone dihedral angles of the sequence X-D-F, where X is the residue before the DFGmotif, and the DFG-Phe side-chain rotamer, utilizing a density-based clustering algorithm. We have identified eight distinct conformations and labeled them based on the Ramachandran regions (A, alpha; B, beta; L, left) of the XDF motif and the Phe rotamer (minus, plus, trans). Our clustering divides the DFGin group into six clusters including BLAminus, which contains active structures, and two common inactive forms, BLBplus and ABAminus. DFGout structures are predominantly in the BBAminus conformation, which is essentially required for binding type II inhibitors. The inactive conformations have specific features that make them unable to bind ATP, magnesium, and/or substrates. Our structurally intuitive nomenclature will aid in understanding the conformational dynamics of kinases and structure-based development of kinase drugs.
Journal Article
AlphaFold2: A Role for Disordered Protein/Region Prediction?
by
Karttunen, Mikko
,
Wilson, Carter J.
,
Choy, Wing-Yiu
in
Accuracy
,
Datasets
,
Intrinsically Disordered Proteins - chemistry
2022
The development of AlphaFold2 marked a paradigm-shift in the structural biology community. Herein, we assess the ability of AlphaFold2 to predict disordered regions against traditional sequence-based disorder predictors. We find that AlphaFold2 performs well at discriminating disordered regions, but also note that the disorder predictor one constructs from an AlphaFold2 structure determines accuracy. In particular, a naïve, but non-trivial assumption that residues assigned to helices, strands, and H-bond stabilized turns are likely ordered and all other residues are disordered results in a dramatic overestimation in disorder; conversely, the predicted local distance difference test (pLDDT) provides an excellent measure of residue-wise disorder. Furthermore, by employing molecular dynamics (MD) simulations, we note an interesting relationship between the pLDDT and secondary structure, that may explain our observations and suggests a broader application of the pLDDT for characterizing the local dynamics of intrinsically disordered proteins and regions (IDPs/IDRs).
Journal Article
Biotite: new tools for a versatile Python bioinformatics library
by
Islam, Faisal
,
Greil, Maximilian
,
Hamacher, Kay
in
Algorithms
,
alpha-Globins - chemistry
,
Applications programs
2023
Background
Biotite is a program library for sequence and structural bioinformatics written for the Python programming language. It implements widely used computational methods into a consistent and accessible package. This allows for easy combination of various data analysis, modeling and simulation methods.
Results
This article presents major functionalities introduced into Biotite since its original publication. The fields of application are shown using concrete examples. We show that the computational performance of Biotite for bioinformatics tasks is comparable to individual, special purpose software systems specifically developed for the respective single task.
Conclusions
The results show that Biotite can be used as program library to either answer specific bioinformatics questions and simultaneously allow the user to write entire, self-contained software applications with sufficient performance for general application.
Journal Article
PyTMs: a useful PyMOL plugin for modeling common post-translational modifications
2014
Background
Post-translational modifications (PTMs) constitute a major aspect of protein biology, particularly signaling events. Conversely, several different pathophysiological PTMs are hallmarks of oxidative imbalance or inflammatory states and are strongly associated with pathogenesis of autoimmune diseases or cancers. Accordingly, it is of interest to assess both the biological and structural effects of modification. For the latter, computer-based modeling offers an attractive option. We thus identified the need for easily applicable modeling options for PTMs.
Results
We developed PyTMs, a plugin implemented with the commonly used visualization software PyMOL. PyTMs enables users to introduce a set of common PTMs into protein/peptide models and can be used to address research questions related to PTMs. Ten types of modification are currently supported, including acetylation, carbamylation, citrullination, cysteine oxidation, malondialdehyde adducts, methionine oxidation, methylation, nitration, proline hydroxylation and phosphorylation. Furthermore, advanced settings integrate the pre-selection of surface-exposed atoms, define stereochemical alternatives and allow for basic structure optimization of the newly modified residues.
Conclusion
PyTMs is a useful, user-friendly modelling plugin for PyMOL. Advantages of PyTMs include standardized generation of PTMs, rapid time-to-result and facilitated user control. Although modeling cannot substitute for conventional structure determination it constitutes a convenient tool that allows uncomplicated exploration of potential implications prior to experimental investments and basic explanation of experimental data. PyTMs is freely available as part of the PyMOL script repository project on GitHub and will further evolve.
Journal Article
PyPropel: a Python-based tool for efficiently processing and characterising protein data
2025
Background
The volume of protein sequence data has grown exponentially in recent years, driven by advancements in metagenomics. Despite this, a substantial proportion of these sequences remain poorly annotated, underscoring the need for robust bioinformatics tools to facilitate efficient characterisation and annotation for functional studies.
Results
We present PyPropel, a Python-based computational tool developed to streamline the large-scale analysis of protein data, with a particular focus on applications in machine learning. PyPropel integrates sequence and structural data pre-processing, feature generation, and post-processing for model performance evaluation and visualisation, offering a comprehensive solution for handling complex protein datasets.
Conclusion
PyPropel provides added value over existing tools by offering a unified workflow that encompasses the full spectrum of protein research, from raw data pre-processing to functional annotation and model performance analysis, thereby supporting efficient protein function studies.
Journal Article
Prop3D: A flexible, Python-based platform for machine learning with protein structural properties and biophysical data
2024
Background
Machine learning (ML) has a rich history in structural bioinformatics, and modern approaches, such as deep learning, are revolutionizing our knowledge of the subtle relationships between biomolecular sequence, structure, function, dynamics and evolution. As with any advance that rests upon statistical learning approaches, the recent progress in biomolecular sciences is enabled by the availability of vast volumes of sufficiently-variable data. To be useful, such data must be well-structured, machine-readable, intelligible and manipulable. These and related requirements pose challenges that become especially acute at the computational scales typical in ML. Furthermore, in structural bioinformatics such data generally relate to protein three-dimensional (3D) structures, which are inherently more complex than sequence-based data. A significant and recurring challenge concerns the creation of large, high-quality, openly-accessible datasets that can be used for specific training and benchmarking tasks in ML pipelines for predictive modeling projects, along with reproducible splits for training and testing.
Results
Here, we report ‘
Prop3D
’, a platform that allows for the creation, sharing and extensible reuse of libraries of protein domains, featurized with biophysical and evolutionary properties that can range from detailed, atomically-resolved physicochemical quantities (e.g., electrostatics) to coarser, residue-level features (e.g., phylogenetic conservation). As a community resource, we also supply a ‘
Prop3D-20sf
’ protein dataset, obtained by applying our approach to
CATH
. We have developed and deployed the
Prop3D
framework, both in the cloud and on local HPC resources, to systematically and reproducibly create comprehensive datasets via the Highly Scalable Data Service (
HSDS
). Our datasets are freely accessible via a public
HSDS
instance, or they can be used with accompanying Python wrappers for popular ML frameworks.
Conclusion
Prop3D
and its associated
Prop3D-20sf
dataset can be of broad utility in at least three ways. Firstly, the
Prop3D
workflow code can be customized and deployed on various cloud-based compute platforms, with scalability achieved largely by saving the results to distributed
HDF5
files via
HSDS
. Secondly, the linked
Prop3D-20sf
dataset provides a hand-crafted, already-featurized dataset of protein domains for 20 highly-populated
CATH
families; importantly, provision of this pre-computed resource can aid the more efficient development (and reproducible deployment) of ML pipelines. Thirdly,
Prop3D-20sf
’s construction explicitly takes into account (in creating datasets and data-splits) the enigma of ‘data leakage’, stemming from the evolutionary relationships between proteins.
Journal Article
Porter 6: Protein Secondary Structure Prediction by Leveraging Pre-Trained Language Models (PLMs)
2024
Accurately predicting protein secondary structure (PSSP) is crucial for understanding protein function, which is foundational to advancements in drug development, disease treatment, and biotechnology. Researchers gain critical insights into protein folding and function within cells by predicting protein secondary structures. The advent of deep learning models, capable of processing complex sequence data and identifying meaningful patterns, offer substantial potential to enhance the accuracy and efficiency of protein structure predictions. In particular, recent breakthroughs in deep learning—driven by the integration of natural language processing (NLP) algorithms—have significantly advanced the field of protein research. Inspired by the remarkable success of NLP techniques, this study harnesses the power of pre-trained language models (PLMs) to advance PSSP prediction. We conduct a comprehensive evaluation of various deep learning models trained on distinct sequence embeddings, including one-hot encoding and PLM-based approaches such as ProtTrans and ESM-2, to develop a cutting-edge prediction system optimized for accuracy and computational efficiency. Our proposed model, Porter 6, is an ensemble of CBRNN-based predictors, leveraging the protein language model ESM-2 as input features. Porter 6 achieves outstanding performance on large-scale, independent test sets. On a 2022 test set, the model attains an impressive 86.60% accuracy in three-state (Q3) and 76.43% in eight-state (Q8) classifications. When tested on a more recent 2024 test set, Porter 6 maintains robust performance, achieving 84.56% in Q3 and 74.18% in Q8 classifications. This represents a significant 3% improvement over its predecessor, outperforming or matching state-of-the-art approaches in the field.
Journal Article
3DVizSNP: a tool for rapidly visualizing missense mutations identified in high throughput experiments in iCn3D
2023
Background
High throughput experiments in cancer and other areas of genomic research identify large numbers of sequence variants that need to be evaluated for phenotypic impact. While many tools exist to score the likely impact of single nucleotide polymorphisms (SNPs) based on sequence alone, the three-dimensional structural environment is essential for understanding the biological impact of a nonsynonymous mutation.
Results
We present a program, 3DVizSNP, that enables the rapid visualization of nonsynonymous missense mutations extracted from a variant caller format file using the web-based iCn3D visualization platform. The program, written in Python, leverages REST APIs and can be run locally without installing any other software or databases, or from a webserver hosted by the National Cancer Institute. It automatically selects the appropriate experimental structure from the Protein Data Bank, if available, or the predicted structure from the AlphaFold database, enabling users to rapidly screen SNPs based on their local structural environment. 3DVizSNP leverages iCn3D annotations and its structural analysis functions to assess changes in structural contacts associated with mutations.
Conclusions
This tool enables researchers to efficiently make use of 3D structural information to prioritize mutations for further computational and experimental impact assessment. The program is available as a webserver at
https://analysistools.cancer.gov/3dvizsnp
or as a standalone python program at
https://github.com/CBIIT-CGBB/3DVizSNP
.
Journal Article
TeachOpenCADD: a teaching platform for computer-aided drug design using open source packages and data
by
Sydow, Dominique
,
Volkamer, Andrea
,
Morger, Andrea
in
Bioinformatics
,
Chemistry
,
Chemistry and Materials Science
2019
Owing to the increase in freely available software and data for cheminformatics and structural bioinformatics, research for computer-aided drug design (CADD) is more and more built on modular, reproducible, and easy-to-share pipelines. While documentation for such tools is available, there are only a few freely accessible examples that teach the underlying concepts focused on CADD, especially addressing users new to the field. Here, we present TeachOpenCADD, a teaching platform developed by students for students, using open source compound and protein data as well as basic and CADD-related Python packages. We provide interactive Jupyter notebooks for central CADD topics, integrating theoretical background and practical code. TeachOpenCADD is freely available on GitHub:
https://github.com/volkamerlab/TeachOpenCADD
.
Journal Article