Asset Details
MbrlCatalogueTitleDetail
Do you wish to reserve the book?
Modeling aspects of the language of life through transfer-learning protein sequences
by
Rost, Burkhard
, Elnaggar, Ahmed
, Nechaev, Dmitrii
, Wang, Yu
, Dallago, Christian
, Heinzinger, Michael
, Matthes, Florian
in
Algorithms
/ Amino Acid Sequence
/ Amino acids
/ Analysis
/ Artificial intelligence
/ Artificial neural networks
/ Big data
/ Bioinformatics
/ Biological evolution
/ Biomedical and Life Sciences
/ Computational biology
/ Computational Biology - methods
/ Computational Biology/Bioinformatics
/ Computer Appl. in Life Sciences
/ Computer applications
/ Data management
/ Databases, Nucleic Acid
/ Databases, Protein
/ Deep learning
/ Evolution
/ Information processing
/ Language
/ Language Modeling
/ Learning algorithms
/ Life Sciences
/ Localization
/ Localization prediction
/ Machine Learning
/ Machine Learning and Artificial Intelligence in Bioinformatics
/ Machine learning for computational and systems biology
/ Microarrays
/ Microbiomes
/ Natural Language Processing
/ Neural networks
/ Neural Networks, Computer
/ Predictions
/ Principles
/ Protein structure
/ Proteins
/ Proteins - chemistry
/ Proteomics
/ Proteomics - methods
/ Research Article
/ Secondary structure
/ Secondary structure prediction
/ Sequence Analysis
/ Sequence Embedding
/ Sequences
/ Structure-function relationships
/ Syntax
/ Textbooks
/ Transfer Learning
2019
Hey, we have placed the reservation for you!
By the way, why not check out events that you can attend while you pick your title.
You are currently in the queue to collect this book. You will be notified once it is your turn to collect the book.
Oops! Something went wrong.
Looks like we were not able to place the reservation. Kindly try again later.
Are you sure you want to remove the book from the shelf?
Modeling aspects of the language of life through transfer-learning protein sequences
by
Rost, Burkhard
, Elnaggar, Ahmed
, Nechaev, Dmitrii
, Wang, Yu
, Dallago, Christian
, Heinzinger, Michael
, Matthes, Florian
in
Algorithms
/ Amino Acid Sequence
/ Amino acids
/ Analysis
/ Artificial intelligence
/ Artificial neural networks
/ Big data
/ Bioinformatics
/ Biological evolution
/ Biomedical and Life Sciences
/ Computational biology
/ Computational Biology - methods
/ Computational Biology/Bioinformatics
/ Computer Appl. in Life Sciences
/ Computer applications
/ Data management
/ Databases, Nucleic Acid
/ Databases, Protein
/ Deep learning
/ Evolution
/ Information processing
/ Language
/ Language Modeling
/ Learning algorithms
/ Life Sciences
/ Localization
/ Localization prediction
/ Machine Learning
/ Machine Learning and Artificial Intelligence in Bioinformatics
/ Machine learning for computational and systems biology
/ Microarrays
/ Microbiomes
/ Natural Language Processing
/ Neural networks
/ Neural Networks, Computer
/ Predictions
/ Principles
/ Protein structure
/ Proteins
/ Proteins - chemistry
/ Proteomics
/ Proteomics - methods
/ Research Article
/ Secondary structure
/ Secondary structure prediction
/ Sequence Analysis
/ Sequence Embedding
/ Sequences
/ Structure-function relationships
/ Syntax
/ Textbooks
/ Transfer Learning
2019
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
Do you wish to request the book?
Modeling aspects of the language of life through transfer-learning protein sequences
by
Rost, Burkhard
, Elnaggar, Ahmed
, Nechaev, Dmitrii
, Wang, Yu
, Dallago, Christian
, Heinzinger, Michael
, Matthes, Florian
in
Algorithms
/ Amino Acid Sequence
/ Amino acids
/ Analysis
/ Artificial intelligence
/ Artificial neural networks
/ Big data
/ Bioinformatics
/ Biological evolution
/ Biomedical and Life Sciences
/ Computational biology
/ Computational Biology - methods
/ Computational Biology/Bioinformatics
/ Computer Appl. in Life Sciences
/ Computer applications
/ Data management
/ Databases, Nucleic Acid
/ Databases, Protein
/ Deep learning
/ Evolution
/ Information processing
/ Language
/ Language Modeling
/ Learning algorithms
/ Life Sciences
/ Localization
/ Localization prediction
/ Machine Learning
/ Machine Learning and Artificial Intelligence in Bioinformatics
/ Machine learning for computational and systems biology
/ Microarrays
/ Microbiomes
/ Natural Language Processing
/ Neural networks
/ Neural Networks, Computer
/ Predictions
/ Principles
/ Protein structure
/ Proteins
/ Proteins - chemistry
/ Proteomics
/ Proteomics - methods
/ Research Article
/ Secondary structure
/ Secondary structure prediction
/ Sequence Analysis
/ Sequence Embedding
/ Sequences
/ Structure-function relationships
/ Syntax
/ Textbooks
/ Transfer Learning
2019
Please be aware that the book you have requested cannot be checked out. If you would like to checkout this book, you can reserve another copy
We have requested the book for you!
Your request is successful and it will be processed during the Library working hours. Please check the status of your request in My Requests.
Oops! Something went wrong.
Looks like we were not able to place your request. Kindly try again later.
Modeling aspects of the language of life through transfer-learning protein sequences
Journal Article
Modeling aspects of the language of life through transfer-learning protein sequences
2019
Request Book From Autostore
and Choose the Collection Method
Overview
Background
Predicting protein function and structure from sequence is one important challenge for computational biology. For 26 years, most state-of-the-art approaches combined machine learning and evolutionary information. However, for some applications retrieving related proteins is becoming too time-consuming. Additionally, evolutionary information is less powerful for small families, e.g. for proteins from the
Dark Proteome
. Both these problems are addressed by the new methodology introduced here.
Results
We introduced a novel way to represent protein sequences as continuous vectors (
embeddings
) by using the language model ELMo taken from natural language processing. By modeling protein sequences, ELMo effectively captured the biophysical properties of the language of life from unlabeled big data (UniRef50). We refer to these new embeddings as
SeqVec
(
Seq
uence-to-
Vec
tor) and demonstrate their effectiveness by training simple neural networks for two different tasks. At the per-residue level, secondary structure (Q3 = 79% ± 1, Q8 = 68% ± 1) and regions with intrinsic disorder (MCC = 0.59 ± 0.03) were predicted significantly better than through one-hot encoding or through Word2vec-like approaches. At the per-protein level, subcellular localization was predicted in ten classes (Q10 = 68% ± 1) and membrane-bound were distinguished from water-soluble proteins (Q2 = 87% ± 1). Although
SeqVec
embeddings generated the best predictions from single sequences, no solution improved over the best existing method using evolutionary information. Nevertheless, our approach improved over some popular methods using evolutionary information and for some proteins even did beat the best. Thus, they prove to condense the underlying principles of protein sequences. Overall, the important novelty is speed: where the lightning-fast
HHblits
needed on average about two minutes to generate the evolutionary information for a target protein,
SeqVec
created embeddings on average in 0.03 s. As this speed-up is independent of the size of growing sequence databases,
SeqVec
provides a highly scalable approach for the analysis of big data in proteomics, i.e. microbiome or metaproteome analysis.
Conclusion
Transfer-learning succeeded to extract information from unlabeled sequence databases relevant for various protein prediction tasks. SeqVec modeled the language of life, namely the principles underlying protein sequences better than any features suggested by textbooks and prediction methods. The exception is evolutionary information, however, that information is not available on the level of a single sequence.
Publisher
BioMed Central,BioMed Central Ltd,Springer Nature B.V,BMC
Subject
/ Analysis
/ Big data
/ Biomedical and Life Sciences
/ Computational Biology - methods
/ Computational Biology/Bioinformatics
/ Computer Appl. in Life Sciences
/ Language
/ Machine Learning and Artificial Intelligence in Bioinformatics
/ Machine learning for computational and systems biology
/ Proteins
/ Secondary structure prediction
/ Structure-function relationships
/ Syntax
This website uses cookies to ensure you get the best experience on our website.