Asset Details

MbrlCatalogueTitleDetail

Do you wish to reserve the book?

A Fast Alignment-Free Approach for De Novo Detection of Protein Conserved Regions

by Kalyanaraman, Ananth , Broschat, Shira L. , Abnousi, Armen

in Acids / Algorithms / Alignment / Amino acid sequence / Amino acid sequencing / Artificial intelligence / Biology and Life Sciences / Classification / Computer and Information Sciences / Conserved sequence / Conserved Sequence - genetics / Data bases / Data collection / Data processing / Databases, Protein / Engineering and Technology / Genomes / Identification / Learning algorithms / Machine learning / Methods / Microprocessors / Molecular biology / Physical Sciences / Pipelines / Protein Domains / Protein families / Protein Structure, Tertiary / Proteins / Research and Analysis Methods / Sensitivity / Sequence Alignment / Sequence Analysis, Protein / Sequence Homology, Amino Acid / Sequences / Software / Training

2016

Yes Please

Hey, we have placed the reservation for you!

By the way, why not check out events that you can attend while you pick your title.

Oops! Something went wrong.

Looks like we were not able to place the reservation. Kindly try again later.

Are you sure you want to remove the book from the shelf?

A Fast Alignment-Free Approach for De Novo Detection of Protein Conserved Regions

by Kalyanaraman, Ananth , Broschat, Shira L. , Abnousi, Armen

2016

Confirm

Do you wish to request the book?

A Fast Alignment-Free Approach for De Novo Detection of Protein Conserved Regions

by Kalyanaraman, Ananth , Broschat, Shira L. , Abnousi, Armen

2016

Please be aware that the book you have requested cannot be checked out. If you would like to checkout this book, you can reserve another copy

How would you like to get it?

Submit

We have requested the book for you!

Your request is successful and it will be processed during the Library working hours. Please check the status of your request in My Requests.

Oops! Something went wrong.

Looks like we were not able to place your request. Kindly try again later.

Journal Article

A Fast Alignment-Free Approach for De Novo Detection of Protein Conserved Regions

Kalyanaraman, Ananth,

Broschat, Shira L.,

Abnousi, Armen

2016

Overview

Identifying conserved regions in protein sequences is a fundamental operation, occurring in numerous sequence-driven analysis pipelines. It is used as a way to decode domain-rich regions within proteins, to compute protein clusters, to annotate sequence function, and to compute evolutionary relationships among protein sequences. A number of approaches exist for identifying and characterizing protein families based on their domains, and because domains represent conserved portions of a protein sequence, the primary computation involved in protein family characterization is identification of such conserved regions. However, identifying conserved regions from large collections (millions) of protein sequences presents significant challenges. In this paper we present a new, alignment-free method for detecting conserved regions in protein sequences called NADDA (No-Alignment Domain Detection Algorithm). Our method exploits the abundance of exact matching short subsequences (k-mers) to quickly detect conserved regions, and the power of machine learning is used to improve the prediction accuracy of detection. We present a parallel implementation of NADDA using the MapReduce framework and show that our method is highly scalable. We have compared NADDA with Pfam and InterPro databases. For known domains annotated by Pfam, accuracy is 83%, sensitivity 96%, and specificity 44%. For sequences with new domains not present in the training set an average accuracy of 63% is achieved when compared to Pfam. A boost in results in comparison with InterPro demonstrates the ability of NADDA to capture conserved regions beyond those present in Pfam. We have also compared NADDA with ADDA and MKDOM2, assuming Pfam as ground-truth. On average NADDA shows comparable accuracy, more balanced sensitivity and specificity, and being alignment-free, is significantly faster. Excluding the one-time cost of training, runtimes on a single processor were 49s, 10,566s, and 456s for NADDA, ADDA, and MKDOM2, respectively, for a data set comprised of approximately 2500 sequences.

Share this book

Add to My Shelf

Publisher

Public Library of Science,Public Library of Science (PLoS)

Subject

Acids

/ Algorithms

/ Alignment

/ Amino acid sequence

/ Amino acid sequencing

/ Artificial intelligence

/ Biology and Life Sciences