Asset Details
MbrlCatalogueTitleDetail
Do you wish to reserve the book?
DPCfam: Unsupervised protein family classification by Density Peak Clustering of large sequence datasets
by
Bateman, Alex
, Russo, Elena Tea
, Punta, Marco
, Barone, Federico
, Laio, Alessandro
, Cozzini, Stefano
in
Algorithms
/ Amino Acid Sequence
/ Annotations
/ Automatic classification
/ Biology and Life Sciences
/ Classification
/ Cluster Analysis
/ Clustering
/ Databases, Protein
/ Datasets
/ Density
/ Domains
/ Hypotheses
/ Identification and classification
/ Methods
/ Protein Domains
/ Protein families
/ Proteins
/ Proteins - genetics
/ Research and Analysis Methods
/ Sequences
2022
Hey, we have placed the reservation for you!
By the way, why not check out events that you can attend while you pick your title.
You are currently in the queue to collect this book. You will be notified once it is your turn to collect the book.
Oops! Something went wrong.
Looks like we were not able to place the reservation. Kindly try again later.
Are you sure you want to remove the book from the shelf?
DPCfam: Unsupervised protein family classification by Density Peak Clustering of large sequence datasets
by
Bateman, Alex
, Russo, Elena Tea
, Punta, Marco
, Barone, Federico
, Laio, Alessandro
, Cozzini, Stefano
in
Algorithms
/ Amino Acid Sequence
/ Annotations
/ Automatic classification
/ Biology and Life Sciences
/ Classification
/ Cluster Analysis
/ Clustering
/ Databases, Protein
/ Datasets
/ Density
/ Domains
/ Hypotheses
/ Identification and classification
/ Methods
/ Protein Domains
/ Protein families
/ Proteins
/ Proteins - genetics
/ Research and Analysis Methods
/ Sequences
2022
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
Do you wish to request the book?
DPCfam: Unsupervised protein family classification by Density Peak Clustering of large sequence datasets
by
Bateman, Alex
, Russo, Elena Tea
, Punta, Marco
, Barone, Federico
, Laio, Alessandro
, Cozzini, Stefano
in
Algorithms
/ Amino Acid Sequence
/ Annotations
/ Automatic classification
/ Biology and Life Sciences
/ Classification
/ Cluster Analysis
/ Clustering
/ Databases, Protein
/ Datasets
/ Density
/ Domains
/ Hypotheses
/ Identification and classification
/ Methods
/ Protein Domains
/ Protein families
/ Proteins
/ Proteins - genetics
/ Research and Analysis Methods
/ Sequences
2022
Please be aware that the book you have requested cannot be checked out. If you would like to checkout this book, you can reserve another copy
We have requested the book for you!
Your request is successful and it will be processed during the Library working hours. Please check the status of your request in My Requests.
Oops! Something went wrong.
Looks like we were not able to place your request. Kindly try again later.
DPCfam: Unsupervised protein family classification by Density Peak Clustering of large sequence datasets
Journal Article
DPCfam: Unsupervised protein family classification by Density Peak Clustering of large sequence datasets
2022
Request Book From Autostore
and Choose the Collection Method
Overview
Proteins that are known only at a sequence level outnumber those with an experimental characterization by orders of magnitude. Classifying protein regions (domains) into homologous families can generate testable functional hypotheses for yet unannotated sequences. Existing domain family resources typically use at least some degree of manual curation: they grow slowly over time and leave a large fraction of the protein sequence space unclassified. We here describe automatic clustering by Density Peak Clustering of UniRef50 v. 2017_07, a protein sequence database including approximately 23M sequences. We performed a radical re-implementation of a pipeline we previously developed in order to allow handling millions of sequences and data volumes of the order of 3 TeraBytes. The modified pipeline, which we call DPCfam, finds ∼ 45,000 protein clusters in UniRef50. Our automatic classification is in close correspondence to the ones of the Pfam and ECOD resources: in particular, about 81% of medium-large Pfam families and 72% of ECOD families can be mapped to clusters generated by DPCfam. In addition, our protocol finds more than 14,000 clusters constituted of protein regions with no Pfam annotation, which are therefore candidates for representing novel protein families. These results are made available to the scientific community through a dedicated repository.
Publisher
Public Library of Science,Public Library of Science (PLoS)
Subject
This website uses cookies to ensure you get the best experience on our website.