Catalogue Search | MBRL

Integrating Hi-C links with assembly graphs for chromosome-scale assembly

by Pop, Mihai , Koren, Sergey , Ghurye, Jay in Algorithms , Animals , Assembly

2019

Long-read sequencing and novel long-range assays have revolutionized de novo genome assembly by automating the reconstruction of reference-quality genomes. In particular, Hi-C sequencing is becoming an economical method for generating chromosome-scale scaffolds. Despite its increasing popularity, there are limited open-source tools available. Errors, particularly inversions and fusions across chromosomes, remain higher than alternate scaffolding technologies. We present a novel open-source Hi-C scaffolder that does not require an a priori estimate of chromosome number and minimizes errors by scaffolding with the assistance of an assembly graph. We demonstrate higher accuracy than the state-of-the-art methods across a variety of Hi-C library preparations and input assembly sizes. The Python and C++ code for our method is openly available at https://github.com/machinegun/SALSA.

Journal Article

Share this book

Add to My Shelf

Tackling the widespread and critical impact of batch effects in high-throughput data

by Baggerly, Keith , Scharpf, Robert B. , Irizarry, Rafael A. in 631/1647/1513 , 631/1647/48 , Agriculture

2010

Batch effects can lead to incorrect biological conclusions but are not widely considered. The authors show that batch effects are relevant to a range of high-throughput 'omics' data sets and are crucial to address. They also explain how batch effects can be mitigated. High-throughput technologies are widely used, for example to assay genetic variants, gene and protein expression, and epigenetic modifications. One often overlooked complication with such studies is batch effects, which occur because measurements are affected by laboratory conditions, reagent lots and personnel differences. This becomes a major problem when batch effects are correlated with an outcome of interest and lead to incorrect conclusions. Using both published studies and our own analyses, we argue that batch effects (as well as other technical and biological artefacts) are widespread and critical to address. We review experimental and computational approaches for doing so.

Journal Article

Share this book

Add to My Shelf

Flexible statistical methods for estimating and testing effects in genomic studies with multiple conditions

by Stephens, Matthew , Urbut, Sarah M. , Wang, Gao in 45/23 , 45/43 , 45/91

2019

We introduce new statistical methods for analyzing genomic data sets that measure many effects in many conditions (for example, gene expression changes under many treatments). These new methods improve on existing methods by allowing for arbitrary correlations in effect sizes among conditions. This flexible approach increases power, improves effect estimates and allows for more quantitative assessments of effect-size heterogeneity compared to simple shared or condition-specific assessments. We illustrate these features through an analysis of locally acting variants associated with gene expression (cis expression quantitative trait loci (eQTLs)) in 44 human tissues. Our analysis identifies more eQTLs than existing approaches, consistent with improved power. We show that although genetic effects on expression are extensively shared among tissues, effect sizes can still vary greatly among tissues. Some shared eQTLs show stronger effects in subsets of biologically related tissues (for example, brain-related tissues), or in only one tissue (for example, testis). Our methods are widely applicable, computationally tractable for many conditions and available online. Multivariate adaptive shrinkage (mash) is a method for estimating and testing multiple effects in multiple conditions. When applied to GTEx data, mash can be used to analyze sharing of eQTL effects by examining variation in effect sizes.

Journal Article

Share this book

Add to My Shelf

Public human microbiome data are dominated by highly developed countries

by Abdill, Richard J. , Blekhman, Ran , Adamowicz, Elizabeth M. in Archives & records , Asia , Bangladesh

2022

The importance of sampling from globally representative populations has been well established in human genomics. In human microbiome research, however, we lack a full understanding of the global distribution of sampling in research studies. This information is crucial to better understand global patterns of microbiome-associated diseases and to extend the health benefits of this research to all populations. Here, we analyze the country of origin of all 444,829 human microbiome samples that are available from the world’s 3 largest genomic data repositories, including the Sequence Read Archive (SRA). The samples are from 2,592 studies of 19 body sites, including 220,017 samples of the gut microbiome. We show that more than 71% of samples with a known origin come from Europe, the United States, and Canada, including 46.8% from the US alone, despite the country representing only 4.3% of the global population. We also find that central and southern Asia is the most underrepresented region: Countries such as India, Pakistan, and Bangladesh account for more than a quarter of the world population but make up only 1.8% of human microbiome samples. These results demonstrate a critical need to ensure more global representation of participants in microbiome studies.

Journal Article

Share this book

Add to My Shelf

Computational methods for transcriptome annotation and quantification using RNA-seq

by Garber, Manuel , Guttman, Mitchell , Grabherr, Manfred G in 631/114/2184 , 631/1647/514/1949 , 631/208/212/2019

2011

High-throughput RNA sequencing (RNA-seq) promises a comprehensive picture of the transcriptome, allowing for the complete annotation and quantification of all genes and their isoforms across samples. Realizing this promise requires increasingly complex computational methods. These computational challenges fall into three main categories: (i) read mapping, (ii) transcriptome reconstruction and (iii) expression quantification. Here we explain the major conceptual and practical challenges, and the general classes of solutions for each category. Finally, we highlight the interdependence between these categories and discuss the benefits for different biological applications.

Journal Article

Share this book

Add to My Shelf

Twelve quick steps for genome assembly and annotation in the classroom

by Chung, J. Sook , Nam, Bo-Hye , Jung, Hyungtaek in Agricultural economics , Agricultural production , Animals

2020

Eukaryotic genome sequencing and de novo assembly, once the exclusive domain of well-funded international consortia, have become increasingly affordable, thus fitting the budgets of individual research groups. Third-generation long-read DNA sequencing technologies are increasingly used, providing extensive genomic toolkits that were once reserved for a few select model organisms. Generating high-quality genome assemblies and annotations for many aquatic species still presents significant challenges due to their large genome sizes, complexity, and high chromosome numbers. Indeed, selecting the most appropriate sequencing and software platforms and annotation pipelines for a new genome project can be daunting because tools often only work in limited contexts. In genomics, generating a high-quality genome assembly/annotation has become an indispensable tool for better understanding the biology of any species. Herein, we state 12 steps to help researchers get started in genome projects by presenting guidelines that are broadly applicable (to any species), sustainable over time, and cover all aspects of genome assembly and annotation projects from start to finish. We review some commonly used approaches, including practical methods to extract high-quality DNA and choices for the best sequencing platforms and library preparations. In addition, we discuss the range of potential bioinformatics pipelines, including structural and functional annotations (e.g., transposable elements and repetitive sequences). This paper also includes information on how to build a wide community for a genome project, the importance of data management, and how to make the data and results Findable, Accessible, Interoperable, and Reusable (FAIR) by submitting them to a public repository and sharing them with the research community.

Journal Article

Share this book

Add to My Shelf

Managing incidental findings and research results in genomic research involving biobanks and archived data sets

by Kahn, Jeffrey P. , Wolf, Susan M. , Richardson, Henry S. in 692/308/2056 , 692/700/179 , Biobanks

2012

Biobanks and archived data sets collecting samples and data have become crucial engines of genetic and genomic research. Unresolved, however, is what responsibilities biobanks should shoulder to manage incidental findings and individual research results of potential health, reproductive, or personal importance to individual contributors (using “biobank” here to refer both to collections of samples and collections of data). This article reports recommendations from a 2-year project funded by the National Institutes of Health. We analyze the responsibilities involved in managing the return of incidental findings and individual research results in a biobank research system (primary research or collection sites, the biobank itself, and secondary research sites). We suggest that biobanks shoulder significant responsibility for seeing that the biobank research system addresses the return question explicitly. When reidentification of individual contributors is possible, the biobank should work to enable the biobank research system to discharge four core responsibilities to (1) clarify the criteria for evaluating findings and the roster of returnable findings, (2) analyze a particular finding in relation to this, (3) reidentify the individual contributor, and (4) recontact the contributor to offer the finding. We suggest that findings that are analytically valid, reveal an established and substantial risk of a serious health condition, and are clinically actionable should generally be offered to consenting contributors. This article specifies 10 concrete recommendations, addressing new biobanks as well as those already in existence. Genet Med 2012:14(4):361–384

Journal Article

Share this book

Add to My Shelf

Genomics for the world

by De La Vega, Francisco M. , Bustamante, Carlos D. , Burchard, Esteban G. in 692/308/2056 , 692/700/478 , comment

2011

Medical genomics has focused almost entirely on those of European descent. Other ethnic groups must be studied to ensure that more people benefit, say Carlos D. Bustamante, Esteban González Burchard and Francisco M. De La Vega.

Journal Article

Share this book

Add to My Shelf

Convergent functional genomics of schizophrenia: from comprehensive understanding to genetic risk prediction

by Koller, D , Nurnberger, J I , Schork, N J in 631/208/191 , 631/208/205/2138 , 692/699/476/1799

2012

We have used a translational convergent functional genomics (CFG) approach to identify and prioritize genes involved in schizophrenia, by gene-level integration of genome-wide association study data with other genetic and gene expression studies in humans and animal models. Using this polyevidence scoring and pathway analyses, we identify top genes (DISC1, TCF4, MBP, MOBP, NCAM1, NRCAM, NDUFV2, RAB18, as well as ADCYAP1, BDNF, CNR1, COMT, DRD2, DTNBP1, GAD1, GRIA1, GRIN2B, HTR2A, NRG1, RELN, SNAP-25, TNIK), brain development, myelination, cell adhesion, glutamate receptor signaling, G-protein–coupled receptor signaling and cAMP-mediated signaling as key to pathophysiology and as targets for therapeutic intervention. Overall, the data are consistent with a model of disrupted connectivity in schizophrenia, resulting from the effects of neurodevelopmental environmental stress on a background of genetic vulnerability. In addition, we show how the top candidate genes identified by CFG can be used to generate a genetic risk prediction score (GRPS) to aid schizophrenia diagnostics, with predictive ability in independent cohorts. The GRPS also differentiates classic age of onset schizophrenia from early onset and late-onset disease. We also show, in three independent cohorts, two European American and one African American, increasing overlap, reproducibility and consistency of findings from single-nucleotide polymorphisms to genes, then genes prioritized by CFG, and ultimately at the level of biological pathways and mechanisms. Finally, we compared our top candidate genes for schizophrenia from this analysis with top candidate genes for bipolar disorder and anxiety disorders from previous CFG analyses conducted by us, as well as findings from the fields of autism and Alzheimer. Overall, our work maps the genomic and biological landscape for schizophrenia, providing leads towards a better understanding of illness, diagnostics and therapeutics. It also reveals the significant genetic overlap with other major psychiatric disorder domains, suggesting the need for improved nosology.

Journal Article

Share this book

Add to My Shelf

Quantifying the impact of public omics data

by Ping, Peipei , Hermjakob, Henning , Glont, Mihai in 101/58 , 38/90 , 38/91

2019

The amount of omics data in the public domain is increasing every year. Modern science has become a data-intensive discipline. Innovative solutions for data management, data sharing, and for discovering novel datasets are therefore increasingly required. In 2016, we released the first version of the Omics Discovery Index (OmicsDI) as a light-weight system to aggregate datasets across multiple public omics data resources. OmicsDI aggregates genomics, transcriptomics, proteomics, metabolomics and multiomics datasets, as well as computational models of biological processes. Here, we propose a set of novel metrics to quantify the attention and impact of biomedical datasets. A complete framework (now integrated into OmicsDI) has been implemented in order to provide and evaluate those metrics. Finally, we propose a set of recommendations for authors, journals and data resources to promote an optimal quantification of the impact of datasets. Increasing amount of public omics data are important and valuable resources for the research community. Here, the authors develop a set of metrics to quantify the attention and impact of biomedical datasets and integrate them into the framework of Omics Discovery Index (OmicsDI).

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter