Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
8
result(s) for
"Muruganujan Anushya"
Sort by:
Protocol Update for large-scale genome and gene function analysis with the PANTHER classification system (v.14.0)
2019
The PANTHER classification system (http://www.pantherdb.org) is a comprehensive system that combines genomes, gene function classifications, pathways and statistical analysis tools to enable biologists to analyze large-scale genome-wide experimental data. The current system (PANTHER v.14.0) covers 131 complete genomes organized into gene families and subfamilies; evolutionary relationships between genes are represented in phylogenetic trees, multiple sequence alignments and statistical models (hidden Markov models (HMMs)). The families and subfamilies are annotated with Gene Ontology (GO) terms, and sequences are assigned to PANTHER pathways. A suite of tools has been built to allow users to browse and query gene functions and analyze large-scale experimental data with a number of statistical tests. PANTHER is widely used by bench scientists, bioinformaticians, computer scientists and systems biologists. Since the protocol for using this tool (v.8.0) was originally published in 2013, there have been substantial improvements and updates in the areas of data quality, data coverage, statistical algorithms and user experience. This Protocol Update provides detailed instructions on how to analyze genome-wide experimental data in the PANTHER classification system.Here the authors provide an update to their 2013 protocol for using the PANTHER classification system, detailing how to analyze genome-wide experimental data with the newest version of PANTHER (v.14.0), with improvements in the areas of data quality, data coverage, statistical algorithms and user experience.
Journal Article
PEREGRINE: A genome-wide prediction of enhancer to gene relationships supported by experimental evidence
by
Mills, Caitlin
,
Muruganujan, Anushya
,
Marconett, Crystal N.
in
Biology and life sciences
,
Cell Line
,
Databases, Genetic
2020
Enhancers are powerful and versatile agents of cell-type specific gene regulation, which are thought to play key roles in human disease. Enhancers are short DNA elements that function primarily as clusters of transcription factor binding sites that are spatially coordinated to regulate expression of one or more specific target genes. These regulatory connections between enhancers and target genes can therefore be characterized as enhancer-gene links that can affect development, disease, and homeostatic cellular processes. Despite their implication in disease and the establishment of cell identity during development, most enhancer-gene links remain unknown. Here we introduce a new, publicly accessible database of predicted enhancer-gene links, PEREGRINE. The PEREGRINE human enhancer-gene links interactive web interface incorporates publicly available experimental data from ChIA-PET, eQTL, and Hi-C assays across 78 cell and tissue types to link 449,627 enhancers to 17,643 protein-coding genes. These enhancer-gene links are made available through the new Enhancer module of the PANTHER database and website where the user may easily access the evidence for each enhancer-gene link, as well as query by target gene and enhancer location.
Journal Article
Large-scale gene function analysis with the PANTHER classification system
by
Thomas, Paul D
,
Muruganujan, Anushya
,
Casagrande, John T
in
631/114/2398
,
631/114/2403
,
631/114/794
2013
The PANTHER (protein annotation through evolutionary relationship) classification system (
http://www.pantherdb.org/
) is a comprehensive system that combines gene function, ontology, pathways and statistical analysis tools that enable biologists to analyze large-scale, genome-wide data from sequencing, proteomics or gene expression experiments. The system is built with 82 complete genomes organized into gene families and subfamilies, and their evolutionary relationships are captured in phylogenetic trees, multiple sequence alignments and statistical models (hidden Markov models or HMMs). Genes are classified according to their function in several different ways: families and subfamilies are annotated with ontology terms (Gene Ontology (GO) and PANTHER protein class), and sequences are assigned to PANTHER pathways. The PANTHER website includes a suite of tools that enable users to browse and query gene functions, and to analyze large-scale experimental data with a number of statistical tests. It is widely used by bench scientists, bioinformaticians, computer scientists and systems biologists. In the 2013 release of PANTHER (v.8.0), in addition to an update of the data content, we redesigned the website interface to improve both user experience and the system's analytical capability. This protocol provides a detailed description of how to analyze genome-wide experimental data with the PANTHER classification system.
Journal Article
PhyloGenes: An online phylogenetics and functional genomics resource for plant gene function inference
2020
We aim to enable the accurate and efficient transfer of knowledge about gene function gained from Arabidopsis thaliana and other model organisms to other plant species. This knowledge transfer is frequently challenging in plants due to duplications of individual genes and whole genomes in plant lineages. Such duplications result in complex evolutionary relationships between related genes, which may have similar sequences but highly divergent functions. In such cases, functional inference requires more than a simple sequence similarity calculation. We have developed an online resource, PhyloGenes (phylogenes.org), that displays precomputed phylogenetic trees for plant gene families along with experimentally validated function information for individual genes within the families. A total of 40 plant genomes and 10 non‐plant model organisms are represented in over 8,000 gene families. Evolutionary events such as speciation and duplication are clearly labeled on gene trees to distinguish orthologs from paralogs. Nearly 6,000 families have at least one member with an experimentally supported annotation to a Gene Ontology (GO) molecular function or biological process term. By displaying experimentally validated gene functions associated to individual genes within a tree, PhyloGenes enables functional inference for genes of uncharacterized function, based on their evolutionary relationships to experimentally studied genes, in a visually traceable manner. For the many families containing genes that have evolved to perform different functions, PhyloGenes facilitates the use of evolutionary history to determine the most likely function of genes that have not been experimentally characterized. Future work will enrich the resource by incorporating additional gene function datasets such as plant gene expression atlas data.
Journal Article
BDQC: a general-purpose analytics tool for domain-blind validation of Big Data
by
Muruganujan, Anushya
,
Campbell, David S
,
Hood, Leroy E
in
Aneuploidy
,
Big Data
,
Bioinformatics
2018
Translational biomedical research is generating exponentially more data: thousands of whole-genome sequences (WGS) are now available; brain data are doubling every two years. Analyses of Big Data, including imaging, genomic, phenotypic, and clinical data, present qualitatively new challenges as well as opportunities. Among the challenges is a proliferation in ways analyses can fail, due largely to the increasing length and complexity of processing pipelines. Anomalies in input data, runtime resource exhaustion or node failure in a distributed computation can all cause pipeline hiccups that are not necessarily obvious in the output. Flaws that can taint results may persist undetected in complex pipelines, a danger amplified by the fact that research is often concurrent with the development of the software on which it depends. On the positive side, the huge sample sizes increase statistical power, which in turn can shed new insight and motivate innovative analytic approaches. We have developed a framework for Big Data Quality Control (BDQC) including an extensible set of heuristic and statistical analyses that identify deviations in data without regard to its meaning (domain-blind analyses). BDQC takes advantage of large sample sizes to classify the samples, estimate distributions and identify outliers. Such outliers may be symptoms of technology failure (e.g., truncated output of one step of a pipeline for a single genome) or may reveal unsuspected \"signal\" in the data (e.g., evidence of aneuploidy in a genome). We have applied the framework to validate real-world WGS analysis pipelines. BDQC successfully identified data outliers representing various failure classes, including genome analyses missing a whole chromosome or part thereof, hidden among thousands of intermediary output files. These failures could then be resolved by reanalyzing the affected samples. BDQC both identified hidden flaws as well as yielded new insights into the data. BDQC is designed to complement quality software development practices. There are multiple benefits from the application of BDQC at all pipeline stages. By verifying input correctness, it can help avoid expensive computations on flawed data. Analysis of intermediary and final results facilitates recovery from aberrant termination of processes. All these computationally inexpensive verifications reduce cryptic analytical artifacts that could seriously preclude clinical-grade genome interpretation. BDQC is available at https://github.com/ini-bdds/bdqc.