Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
2,137
result(s) for
"Databases, Genetic - statistics "
Sort by:
Electronically ascertained extended pedigrees in breast cancer genetic counseling
2019
A comprehensive pedigree, usually provided by the counselee and verified by medical records, is essential for risk assessment in cancer genetic counseling. Collecting the relevant information is time-consuming and sometimes impossible. We studied the use of electronically ascertained pedigrees (EGP). The study group comprised women (n = 1352) receiving HBOC genetic counseling between December 2006 and December 2016 at Landspitali in Iceland. EGP’s were ascertained using information from the population-based Genealogy Database and Icelandic Cancer Registry. The likelihood of being positive for the Icelandic founder BRCA2 pathogenic variant NM_000059.3:c.767_771delCAAAT was calculated using the risk assessment program Boadicea. We used this unique data to estimate the optimal size of pedigrees, e.g., those that best balance the accuracy of risk assessment using Boadicea and cost of ascertainment. Sub-groups of randomly selected 104 positive and 105 negative women for the founder BRCA2 PV were formed and Receiver Operating Characteristics curves compared for efficiency of PV prediction with a Boadicea score. The optimal pedigree size included 3° relatives or up to five generations with an average no. of 53.8 individuals (range 9–220) (AUC 0.801). Adding 4° relatives did not improve the outcome. Pedigrees including 3° relatives are difficult and sometimes impossible to generate with conventional methods. Pedigrees ascertained with data from pre-existing genealogy databases and cancer registries can save effort and contain more information than traditional pedigrees. Genetic services should consider generating EGP’s which requires access to an accurate genealogy database and cancer registry. Local data protection laws and regulations have to be addressed.
Journal Article
The support of human genetic evidence for approved drug indications
2015
Matthew Nelson and colleagues investigate how well genetic evidence for disease susceptibility predicts drug mechanisms. They find a correlation between gene products that are successful drug targets and genetic loci associated with the disease treated by the drug and predict that selecting genetically supported targets could increase the success rate of drugs in clinical development.
Over a quarter of drugs that enter clinical development fail because they are ineffective. Growing insight into genes that influence human disease may affect how drug targets and indications are selected. However, there is little guidance about how much weight should be given to genetic evidence in making these key decisions. To answer this question, we investigated how well the current archive of genetic evidence predicts drug mechanisms. We found that, among well-studied indications, the proportion of drug mechanisms with direct genetic support increases significantly across the drug development pipeline, from 2.0% at the preclinical stage to 8.2% among mechanisms for approved drugs, and varies dramatically among disease areas. We estimate that selecting genetically supported targets could double the success rate in clinical development. Therefore, using the growing wealth of human genetic data to select the best targets and indications should have a measurable impact on the successful development of new drugs.
Journal Article
RESCRIPt: Reproducible sequence taxonomy reference database management
by
Robeson, Michael S.
,
Bokulich, Nicholas A.
,
Dillon, Matthew R.
in
Animals
,
Biology and Life Sciences
,
Classification
2021
Nucleotide sequence and taxonomy reference databases are critical resources for widespread applications including marker-gene and metagenome sequencing for microbiome analysis, diet metabarcoding, and environmental DNA (eDNA) surveys. Reproducibly generating, managing, using, and evaluating nucleotide sequence and taxonomy reference databases creates a significant bottleneck for researchers aiming to generate custom sequence databases. Furthermore, database composition drastically influences results, and lack of standardization limits cross-study comparisons. To address these challenges, we developed RESCRIPt, a Python 3 software package and QIIME 2 plugin for reproducible generation and management of reference sequence taxonomy databases, including dedicated functions that streamline creating databases from popular sources, and functions for evaluating, comparing, and interactively exploring qualitative and quantitative characteristics across reference databases. To highlight the breadth and capabilities of RESCRIPt, we provide several examples for working with popular databases for microbiome profiling (SILVA, Greengenes, NCBI-RefSeq, GTDB), eDNA and diet metabarcoding surveys (BOLD, GenBank), as well as for genome comparison. We show that bigger is not always better, and reference databases with standardized taxonomies and those that focus on type strains have quantitative advantages, though may not be appropriate for all use cases. Most databases appear to benefit from some curation (quality filtering), though sequence clustering appears detrimental to database quality. Finally, we demonstrate the breadth and extensibility of RESCRIPt for reproducible workflows with a comparison of global hepatitis genomes. RESCRIPt provides tools to democratize the process of reference database acquisition and management, enabling researchers to reproducibly and transparently create reference materials for diverse research applications. RESCRIPt is released under a permissive BSD-3 license at https://github.com/bokulich-lab/RESCRIPt .
Journal Article
Strelka2: fast and accurate calling of germline and somatic variants
2018
We describe Strelka2 (https://github.com/Illumina/strelka), an open-source small-variant-calling method for research and clinical germline and somatic sequencing applications. Strelka2 introduces a novel mixture-model-based estimation of insertion/deletion error parameters from each sample, an efficient tiered haplotype-modeling strategy, and a normal sample contamination model to improve liquid tumor analysis. For both germline and somatic calling, Strelka2 substantially outperformed the current leading tools in terms of both variant-calling accuracy and computing cost.
Journal Article
Polygenic prediction via Bayesian regression and continuous shrinkage priors
2019
Polygenic risk scores (PRS) have shown promise in predicting human complex traits and diseases. Here, we present PRS-CS, a polygenic prediction method that infers posterior effect sizes of single nucleotide polymorphisms (SNPs) using genome-wide association summary statistics and an external linkage disequilibrium (LD) reference panel. PRS-CS utilizes a high-dimensional Bayesian regression framework, and is distinct from previous work by placing a continuous shrinkage (CS) prior on SNP effect sizes, which is robust to varying genetic architectures, provides substantial computational advantages, and enables multivariate modeling of local LD patterns. Simulation studies using data from the UK Biobank show that PRS-CS outperforms existing methods across a wide range of genetic architectures, especially when the training sample size is large. We apply PRS-CS to predict six common complex diseases and six quantitative traits in the Partners HealthCare Biobank, and further demonstrate the improvement of PRS-CS in prediction accuracy over alternative methods.
Polygenic risk scores (PRS) have the potential to predict complex diseases and traits from genetic data. Here, Ge et al. develop PRS-CS which uses a Bayesian regression framework, continuous shrinkage (CS) priors and an external LD reference panel for polygenic prediction of binary and quantitative traits from GWAS summary statistics.
Journal Article
A synthetic-diploid benchmark for accurate variant-calling evaluation
2018
Existing benchmark datasets for use in evaluating variant-calling accuracy are constructed from a consensus of known short-variant callers, and they are thus biased toward easy regions that are accessible by these algorithms. We derived a new benchmark dataset from the de novo PacBio assemblies of two fully homozygous human cell lines, which provides a relatively more accurate and less biased estimate of small-variant-calling error rates in a realistic context.
Journal Article
Reuse of public genome-wide gene expression data
2013
Key Points
Over the past decade, high-throughput gene expression experiments have generated data from millions of assays. Data sets linked to publications are stored in functional genomics data archives: ArrayExpress at the European Bioinformatics Institute, Gene Expression Omnibus at the US National Center for Biotechnology Information and at the DNA Databank of Japan Omics Archive.
Secondary added-value and topical databases process data from the primary archives, adding analysis and annotation to make these data accessible to every biologist by allowing queries such as 'in which tissue is a particular gene expressed?' or 'which genes are differentially expressed between a particular disease and normal samples?'
Public gene expression data are commonly reused to study biological questions, both by reanalysis of primary data and by queries to secondary resources. Approximately half of the studies that use public gene expression data rely solely on existing data without adding newly generated data, and half of them use the public data in combination with new data.
The reproducibility of published microarray-based studies is limited, mostly owing to insufficient experiment annotation and sometimes to unavailability of the raw or processed data. A stricter enforcement of Minimum Information About a Microarray Experiment (MIAME) requirements and also development of easy-to-use experiment annotation tools are needed to achieve a better reproducibility.
Although most of the public gene expression data still are based on microarray experiments, the contribution of high-throughput-sequencing-based expression studies, known as RNA sequencing (RNA-seq), are growing rapidly.
Reuse of RNA-seq data can potentially be even more valuable than reuse of microarray data, partly owing to the costs of experiments and data storage but even more importantly because of a more quantitative nature of sequencing-based expression data. Community standards such as Minimum Information about Sequencing Experiments (MINSEQE) should be adopted to make RNA-seq data maximally reusable.
The bioinformatics resources that store and manage public data are sensitive to short-term funding changes, complicating the maintenance of important databases. The development of long-term infrastructure in bioinformatics, such as the ELIXIR project in Europe, is needed to ensure the long term availability of public data.
A wealth of microarray gene expression data and a growing volume of RNA sequencing data are now available in public databases. The authors look at how these data are being used and discuss considerations for how such data should be analysed and deposited and how data reuse could be improved.
Our understanding of gene expression has changed dramatically over the past decade, largely catalysed by technological developments. High-throughput experiments — microarrays and next-generation sequencing — have generated large amounts of genome-wide gene expression data that are collected in public archives. Added-value databases process, analyse and annotate these data further to make them accessible to every biologist. In this Review, we discuss the utility of the gene expression data that are in the public domain and how researchers are making use of these data. Reuse of public data can be very powerful, but there are many obstacles in data preparation and analysis and in the interpretation of the results. We will discuss these challenges and provide recommendations that we believe can improve the utility of such data.
Journal Article
African ancestry GWAS of dementia in a large military cohort identifies significant risk loci
2023
While genome wide association studies (GWASs) of Alzheimer’s Disease (AD) in European (EUR) ancestry cohorts have identified approximately 83 potentially independent AD risk loci, progress in non-European populations has lagged. In this study, data from the Million Veteran Program (MVP), a biobank which includes genetic data from more than 650,000 US Veteran participants, was used to examine dementia genetics in an African descent (AFR) cohort. A GWAS of Alzheimer’s disease and related dementias (ADRD), an expanded AD phenotype including dementias such as vascular and non-specific dementia that included 4012 cases and 18,435 controls age 60+ in AFR MVP participants was performed. A proxy dementia GWAS based on survey-reported parental AD or dementia (
n
= 4385 maternal cases, 2256 paternal cases, and 45,970 controls) was also performed. These two GWASs were meta-analyzed, and then subsequently compared and meta-analyzed with the results from a previous AFR AD GWAS from the Alzheimer’s Disease Genetics Consortium (ADGC). A meta-analysis of common variants across the MVP ADRD and proxy GWASs yielded GWAS significant associations in the region of
APOE
(
p
= 2.48 × 10
−
101
), in
ROBO1
(rs11919682,
p
= 1.63 × 10
−
8
), and RNA RP11-340A13.2 (rs148433063,
p
= 8.56 × 10
−
9
). The MVP/ADGC meta-analysis yielded additional significant SNPs near known AD risk genes
TREM2
(rs73427293,
p
= 2.95 × 10
−
9
),
CD2AP
(rs7738720,
p
= 1.14 × 10
−9
), and
ABCA7
(rs73505251,
p
= 3.26 × 10
−10
), although the peak variants observed in these genes differed from those previously reported in EUR and AFR cohorts. Of the genes in or near suggestive or genome-wide significant associated variants, nine (
CDA, SH2D5, DCBLD1, EML6, GOPC, ABCA7, ROS1, TMCO4
, and
TREM2
) were differentially expressed in the brains of AD cases and controls. This represents the largest AFR GWAS of AD and dementia, finding non-
APOE
GWAS-significant common SNPs associated with dementia. Increasing representation of AFR participants is an important priority in genetic studies and may lead to increased insight into AD pathophysiology and reduce health disparities.
Journal Article
Analysing and interpreting DNA methylation data
2012
Key Points
Recent technological advances make it possible to map DNA methylation in essentially any cell type, tissue or organism.
Computational methods and software tools are essential for processing, analysing and interpreting large-scale DNA methylation data sets.
Tailored software tools are now available for processing data obtained with all common methods for genome-wide DNA methylation mapping (including bisulphite sequencing and the Infinium assay).
Bioinformatic methods for visualization of DNA methylation data facilitate quality assessment and help to pinpoint global trends in the data.
By combining stringent statistical methods with computational and experimental validation, researchers can establish accurate lists of differentially methylated regions for a phenotype of interest.
Biological interpretation of differential DNA methylation is aided by computational tools for data exploration and enrichment analysis.
Large community projects are currently generating reference epigenome maps for many different cell types; the interpretation of these maps will require a comprehensive effort in functional epigenomics.
The analysis and interpretation of genome-wide DNA methylation data poses unique bioinformatics challenges. In this article, the tools that are available for processing, visualizing and interpreting these epigenetic data sets are discussed, and the relative advantages of various methods are considered.
DNA methylation is an epigenetic mark that has suspected regulatory roles in a broad range of biological processes and diseases. The technology is now available for studying DNA methylation genome-wide, at a high resolution and in a large number of samples. This Review discusses relevant concepts, computational methods and software tools for analysing and interpreting DNA methylation data. It focuses not only on the bioinformatic challenges of large epigenome-mapping projects and epigenome-wide association studies but also highlights software tools that make genome-wide DNA methylation mapping more accessible for laboratories with limited bioinformatics experience.
Journal Article
Harnessing machine learning to guide phylogenetic-tree search algorithms
2021
Inferring a phylogenetic tree is a fundamental challenge in evolutionary studies. Current paradigms for phylogenetic tree reconstruction rely on performing costly likelihood optimizations. With the aim of making tree inference feasible for problems involving more than a handful of sequences, inference under the maximum-likelihood paradigm integrates heuristic approaches to evaluate only a subset of all potential trees. Consequently, existing methods suffer from the known tradeoff between accuracy and running time. In this proof-of-concept study, we train a machine-learning algorithm over an extensive cohort of empirical data to predict the neighboring trees that increase the likelihood, without actually computing their likelihood. This provides means to safely discard a large set of the search space, thus potentially accelerating heuristic tree searches without losing accuracy. Our analyses suggest that machine learning can guide tree-search methodologies towards the most promising candidate trees.
Likelihood optimization in phylogenetic tree reconstruction is computationally intensive, especially as the number of sequences and taxa included increase. Here, Azouri et al. show how an artificial intelligence approach can reduce computational time without losing accuracy of tree inference.
Journal Article