Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
      More Filters
      Clear All
      More Filters
      Source
    • Language
2,044 result(s) for "Databases, Genetic - statistics "
Sort by:
Electronically ascertained extended pedigrees in breast cancer genetic counseling
A comprehensive pedigree, usually provided by the counselee and verified by medical records, is essential for risk assessment in cancer genetic counseling. Collecting the relevant information is time-consuming and sometimes impossible. We studied the use of electronically ascertained pedigrees (EGP). The study group comprised women (n = 1352) receiving HBOC genetic counseling between December 2006 and December 2016 at Landspitali in Iceland. EGP’s were ascertained using information from the population-based Genealogy Database and Icelandic Cancer Registry. The likelihood of being positive for the Icelandic founder BRCA2 pathogenic variant NM_000059.3:c.767_771delCAAAT was calculated using the risk assessment program Boadicea. We used this unique data to estimate the optimal size of pedigrees, e.g., those that best balance the accuracy of risk assessment using Boadicea and cost of ascertainment. Sub-groups of randomly selected 104 positive and 105 negative women for the founder BRCA2 PV were formed and Receiver Operating Characteristics curves compared for efficiency of PV prediction with a Boadicea score. The optimal pedigree size included 3° relatives or up to five generations with an average no. of 53.8 individuals (range 9–220) (AUC 0.801). Adding 4° relatives did not improve the outcome. Pedigrees including 3° relatives are difficult and sometimes impossible to generate with conventional methods. Pedigrees ascertained with data from pre-existing genealogy databases and cancer registries can save effort and contain more information than traditional pedigrees. Genetic services should consider generating EGP’s which requires access to an accurate genealogy database and cancer registry. Local data protection laws and regulations have to be addressed.
The support of human genetic evidence for approved drug indications
Matthew Nelson and colleagues investigate how well genetic evidence for disease susceptibility predicts drug mechanisms. They find a correlation between gene products that are successful drug targets and genetic loci associated with the disease treated by the drug and predict that selecting genetically supported targets could increase the success rate of drugs in clinical development. Over a quarter of drugs that enter clinical development fail because they are ineffective. Growing insight into genes that influence human disease may affect how drug targets and indications are selected. However, there is little guidance about how much weight should be given to genetic evidence in making these key decisions. To answer this question, we investigated how well the current archive of genetic evidence predicts drug mechanisms. We found that, among well-studied indications, the proportion of drug mechanisms with direct genetic support increases significantly across the drug development pipeline, from 2.0% at the preclinical stage to 8.2% among mechanisms for approved drugs, and varies dramatically among disease areas. We estimate that selecting genetically supported targets could double the success rate in clinical development. Therefore, using the growing wealth of human genetic data to select the best targets and indications should have a measurable impact on the successful development of new drugs.
RESCRIPt: Reproducible sequence taxonomy reference database management
Nucleotide sequence and taxonomy reference databases are critical resources for widespread applications including marker-gene and metagenome sequencing for microbiome analysis, diet metabarcoding, and environmental DNA (eDNA) surveys. Reproducibly generating, managing, using, and evaluating nucleotide sequence and taxonomy reference databases creates a significant bottleneck for researchers aiming to generate custom sequence databases. Furthermore, database composition drastically influences results, and lack of standardization limits cross-study comparisons. To address these challenges, we developed RESCRIPt, a Python 3 software package and QIIME 2 plugin for reproducible generation and management of reference sequence taxonomy databases, including dedicated functions that streamline creating databases from popular sources, and functions for evaluating, comparing, and interactively exploring qualitative and quantitative characteristics across reference databases. To highlight the breadth and capabilities of RESCRIPt, we provide several examples for working with popular databases for microbiome profiling (SILVA, Greengenes, NCBI-RefSeq, GTDB), eDNA and diet metabarcoding surveys (BOLD, GenBank), as well as for genome comparison. We show that bigger is not always better, and reference databases with standardized taxonomies and those that focus on type strains have quantitative advantages, though may not be appropriate for all use cases. Most databases appear to benefit from some curation (quality filtering), though sequence clustering appears detrimental to database quality. Finally, we demonstrate the breadth and extensibility of RESCRIPt for reproducible workflows with a comparison of global hepatitis genomes. RESCRIPt provides tools to democratize the process of reference database acquisition and management, enabling researchers to reproducibly and transparently create reference materials for diverse research applications. RESCRIPt is released under a permissive BSD-3 license at https://github.com/bokulich-lab/RESCRIPt .
Polygenic prediction via Bayesian regression and continuous shrinkage priors
Polygenic risk scores (PRS) have shown promise in predicting human complex traits and diseases. Here, we present PRS-CS, a polygenic prediction method that infers posterior effect sizes of single nucleotide polymorphisms (SNPs) using genome-wide association summary statistics and an external linkage disequilibrium (LD) reference panel. PRS-CS utilizes a high-dimensional Bayesian regression framework, and is distinct from previous work by placing a continuous shrinkage (CS) prior on SNP effect sizes, which is robust to varying genetic architectures, provides substantial computational advantages, and enables multivariate modeling of local LD patterns. Simulation studies using data from the UK Biobank show that PRS-CS outperforms existing methods across a wide range of genetic architectures, especially when the training sample size is large. We apply PRS-CS to predict six common complex diseases and six quantitative traits in the Partners HealthCare Biobank, and further demonstrate the improvement of PRS-CS in prediction accuracy over alternative methods. Polygenic risk scores (PRS) have the potential to predict complex diseases and traits from genetic data. Here, Ge et al. develop PRS-CS which uses a Bayesian regression framework, continuous shrinkage (CS) priors and an external LD reference panel for polygenic prediction of binary and quantitative traits from GWAS summary statistics.
Reuse of public genome-wide gene expression data
Key Points Over the past decade, high-throughput gene expression experiments have generated data from millions of assays. Data sets linked to publications are stored in functional genomics data archives: ArrayExpress at the European Bioinformatics Institute, Gene Expression Omnibus at the US National Center for Biotechnology Information and at the DNA Databank of Japan Omics Archive. Secondary added-value and topical databases process data from the primary archives, adding analysis and annotation to make these data accessible to every biologist by allowing queries such as 'in which tissue is a particular gene expressed?' or 'which genes are differentially expressed between a particular disease and normal samples?' Public gene expression data are commonly reused to study biological questions, both by reanalysis of primary data and by queries to secondary resources. Approximately half of the studies that use public gene expression data rely solely on existing data without adding newly generated data, and half of them use the public data in combination with new data. The reproducibility of published microarray-based studies is limited, mostly owing to insufficient experiment annotation and sometimes to unavailability of the raw or processed data. A stricter enforcement of Minimum Information About a Microarray Experiment (MIAME) requirements and also development of easy-to-use experiment annotation tools are needed to achieve a better reproducibility. Although most of the public gene expression data still are based on microarray experiments, the contribution of high-throughput-sequencing-based expression studies, known as RNA sequencing (RNA-seq), are growing rapidly. Reuse of RNA-seq data can potentially be even more valuable than reuse of microarray data, partly owing to the costs of experiments and data storage but even more importantly because of a more quantitative nature of sequencing-based expression data. Community standards such as Minimum Information about Sequencing Experiments (MINSEQE) should be adopted to make RNA-seq data maximally reusable. The bioinformatics resources that store and manage public data are sensitive to short-term funding changes, complicating the maintenance of important databases. The development of long-term infrastructure in bioinformatics, such as the ELIXIR project in Europe, is needed to ensure the long term availability of public data. A wealth of microarray gene expression data and a growing volume of RNA sequencing data are now available in public databases. The authors look at how these data are being used and discuss considerations for how such data should be analysed and deposited and how data reuse could be improved. Our understanding of gene expression has changed dramatically over the past decade, largely catalysed by technological developments. High-throughput experiments — microarrays and next-generation sequencing — have generated large amounts of genome-wide gene expression data that are collected in public archives. Added-value databases process, analyse and annotate these data further to make them accessible to every biologist. In this Review, we discuss the utility of the gene expression data that are in the public domain and how researchers are making use of these data. Reuse of public data can be very powerful, but there are many obstacles in data preparation and analysis and in the interpretation of the results. We will discuss these challenges and provide recommendations that we believe can improve the utility of such data.
A synthetic-diploid benchmark for accurate variant-calling evaluation
Existing benchmark datasets for use in evaluating variant-calling accuracy are constructed from a consensus of known short-variant callers, and they are thus biased toward easy regions that are accessible by these algorithms. We derived a new benchmark dataset from the de novo PacBio assemblies of two fully homozygous human cell lines, which provides a relatively more accurate and less biased estimate of small-variant-calling error rates in a realistic context.
African ancestry GWAS of dementia in a large military cohort identifies significant risk loci
While genome wide association studies (GWASs) of Alzheimer’s Disease (AD) in European (EUR) ancestry cohorts have identified approximately 83 potentially independent AD risk loci, progress in non-European populations has lagged. In this study, data from the Million Veteran Program (MVP), a biobank which includes genetic data from more than 650,000 US Veteran participants, was used to examine dementia genetics in an African descent (AFR) cohort. A GWAS of Alzheimer’s disease and related dementias (ADRD), an expanded AD phenotype including dementias such as vascular and non-specific dementia that included 4012 cases and 18,435 controls age 60+ in AFR MVP participants was performed. A proxy dementia GWAS based on survey-reported parental AD or dementia ( n  = 4385 maternal cases, 2256 paternal cases, and 45,970 controls) was also performed. These two GWASs were meta-analyzed, and then subsequently compared and meta-analyzed with the results from a previous AFR AD GWAS from the Alzheimer’s Disease Genetics Consortium (ADGC). A meta-analysis of common variants across the MVP ADRD and proxy GWASs yielded GWAS significant associations in the region of APOE ( p  = 2.48 × 10 − 101 ), in ROBO1 (rs11919682, p  = 1.63 × 10 − 8 ), and RNA RP11-340A13.2 (rs148433063, p  = 8.56 × 10 − 9 ). The MVP/ADGC meta-analysis yielded additional significant SNPs near known AD risk genes TREM2 (rs73427293, p  = 2.95 × 10 − 9 ), CD2AP (rs7738720, p  = 1.14 × 10 −9 ), and ABCA7 (rs73505251, p  = 3.26 × 10 −10 ), although the peak variants observed in these genes differed from those previously reported in EUR and AFR cohorts. Of the genes in or near suggestive or genome-wide significant associated variants, nine ( CDA, SH2D5, DCBLD1, EML6, GOPC, ABCA7, ROS1, TMCO4 , and TREM2 ) were differentially expressed in the brains of AD cases and controls. This represents the largest AFR GWAS of AD and dementia, finding non- APOE GWAS-significant common SNPs associated with dementia. Increasing representation of AFR participants is an important priority in genetic studies and may lead to increased insight into AD pathophysiology and reduce health disparities.
Analysing and interpreting DNA methylation data
Key Points Recent technological advances make it possible to map DNA methylation in essentially any cell type, tissue or organism. Computational methods and software tools are essential for processing, analysing and interpreting large-scale DNA methylation data sets. Tailored software tools are now available for processing data obtained with all common methods for genome-wide DNA methylation mapping (including bisulphite sequencing and the Infinium assay). Bioinformatic methods for visualization of DNA methylation data facilitate quality assessment and help to pinpoint global trends in the data. By combining stringent statistical methods with computational and experimental validation, researchers can establish accurate lists of differentially methylated regions for a phenotype of interest. Biological interpretation of differential DNA methylation is aided by computational tools for data exploration and enrichment analysis. Large community projects are currently generating reference epigenome maps for many different cell types; the interpretation of these maps will require a comprehensive effort in functional epigenomics. The analysis and interpretation of genome-wide DNA methylation data poses unique bioinformatics challenges. In this article, the tools that are available for processing, visualizing and interpreting these epigenetic data sets are discussed, and the relative advantages of various methods are considered. DNA methylation is an epigenetic mark that has suspected regulatory roles in a broad range of biological processes and diseases. The technology is now available for studying DNA methylation genome-wide, at a high resolution and in a large number of samples. This Review discusses relevant concepts, computational methods and software tools for analysing and interpreting DNA methylation data. It focuses not only on the bioinformatic challenges of large epigenome-mapping projects and epigenome-wide association studies but also highlights software tools that make genome-wide DNA methylation mapping more accessible for laboratories with limited bioinformatics experience.
Comparative analysis of a large dataset indicates that internal transcribed spacer (ITS) should be incorporated into the core barcode for seed plants
A two-marker combination of plastid rbcL and matK has previously been recommended as the core plant barcode, to be supplemented with additional markers such as plastid trnH–psbA and nuclear ribosomal internal transcribed spacer (ITS). To assess the effectiveness and universality of these barcode markers in seed plants, we sampled 6,286 individuals representing 1,757 species in 141 genera of 75 families (42 orders) by using four different methods of data analysis. These analyses indicate that (i) the three plastid markers showed high levels of universality (87.1–92.7%), whereas ITS performed relatively well (79%) in angiosperms but not so well in gymnosperms; (ii) in taxonomic groups for which direct sequencing of the marker is possible, ITS showed the highest discriminatory power of the four markers, and a combination of ITS and any plastid DNA marker was able to discriminate 69.9–79.1% of species, compared with only 49.7% with rbcL + matK; and (iii) where multiple individuals of a single species were tested, ascriptions based on ITS and plastid DNA barcodes were incongruent in some samples for 45.2% of the sampled genera (for genera with more than one species sampled). This finding highlights the importance of both sampling multiple individuals and using markers with different modes of inheritance. In cases where it is difficult to amplify and directly sequence ITS in its entirety, just using ITS2 is a useful backup because it is easier to amplify and sequence this subset of the marker. We therefore propose that ITS/ITS2 should be incorporated into the core barcode for seed plants.
Convergent functional genomics of schizophrenia: from comprehensive understanding to genetic risk prediction
We have used a translational convergent functional genomics (CFG) approach to identify and prioritize genes involved in schizophrenia, by gene-level integration of genome-wide association study data with other genetic and gene expression studies in humans and animal models. Using this polyevidence scoring and pathway analyses, we identify top genes (DISC1, TCF4, MBP, MOBP, NCAM1, NRCAM, NDUFV2, RAB18, as well as ADCYAP1, BDNF, CNR1, COMT, DRD2, DTNBP1, GAD1, GRIA1, GRIN2B, HTR2A, NRG1, RELN, SNAP-25, TNIK), brain development, myelination, cell adhesion, glutamate receptor signaling, G-protein–coupled receptor signaling and cAMP-mediated signaling as key to pathophysiology and as targets for therapeutic intervention. Overall, the data are consistent with a model of disrupted connectivity in schizophrenia, resulting from the effects of neurodevelopmental environmental stress on a background of genetic vulnerability. In addition, we show how the top candidate genes identified by CFG can be used to generate a genetic risk prediction score (GRPS) to aid schizophrenia diagnostics, with predictive ability in independent cohorts. The GRPS also differentiates classic age of onset schizophrenia from early onset and late-onset disease. We also show, in three independent cohorts, two European American and one African American, increasing overlap, reproducibility and consistency of findings from single-nucleotide polymorphisms to genes, then genes prioritized by CFG, and ultimately at the level of biological pathways and mechanisms. Finally, we compared our top candidate genes for schizophrenia from this analysis with top candidate genes for bipolar disorder and anxiety disorders from previous CFG analyses conducted by us, as well as findings from the fields of autism and Alzheimer. Overall, our work maps the genomic and biological landscape for schizophrenia, providing leads towards a better understanding of illness, diagnostics and therapeutics. It also reveals the significant genetic overlap with other major psychiatric disorder domains, suggesting the need for improved nosology.