Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Language
      Language
      Clear All
      Language
  • Subject
      Subject
      Clear All
      Subject
  • Item Type
      Item Type
      Clear All
      Item Type
  • Discipline
      Discipline
      Clear All
      Discipline
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
1,255 result(s) for "Databases, Genetic - standards"
Sort by:
Perspectives on ENCODE
The Encylopedia of DNA Elements (ENCODE) Project launched in 2003 with the long-term goal of developing a comprehensive map of functional elements in the human genome. These included genes, biochemical regions associated with gene regulation (for example, transcription factor binding sites, open chromatin, and histone marks) and transcript isoforms. The marks serve as sites for candidate cis -regulatory elements (cCREs) that may serve functional roles in regulating gene expression 1 . The project has been extended to model organisms, particularly the mouse. In the third phase of ENCODE, nearly a million and more than 300,000 cCRE annotations have been generated for human and mouse, respectively, and these have provided a valuable resource for the scientific community. The authors summarize the history of the ENCODE Project, the achievements of ENCODE 1 and ENCODE 2, and how the new data generated and analysed in ENCODE 3 complement the previous phases.
Microbial resolution of whole genome shotgun and 16S amplicon metagenomic sequencing using publicly available NEON data
Microorganisms are ubiquitous in the biosphere, playing a crucial role in both biogeochemistry of the planet and human health. However, identifying these microorganisms and defining their function are challenging. Widely used approaches in comparative metagenomics, 16S amplicon sequencing and whole genome shotgun sequencing (WGS), have provided access to DNA sequencing analysis to identify microorganisms and evaluate diversity and abundance in various environments. However, advances in parallel high-throughput DNA sequencing in the past decade have introduced major hurdles, namely standardization of methods, data storage, reproducible interoperability of results, and data sharing. The National Ecological Observatory Network (NEON), established by the National Science Foundation, enables all researchers to address queries on a regional to continental scale around a variety of environmental challenges and provide high-quality, integrated, and standardized data from field sites across the U.S. As the amount of metagenomic data continues to grow, standardized procedures that allow results across projects to be assessed and compared is becoming increasingly important in the field of metagenomics. We demonstrate the feasibility of using publicly available NEON soil metagenomic sequencing datasets in combination with open access Metagenomics Rapid Annotation using the Subsystem Technology (MG-RAST) server to illustrate advantages of WGS compared to 16S amplicon sequencing. Four WGS and four 16S amplicon sequence datasets, from surface soil samples prepared by NEON investigators, were selected for comparison, using standardized protocols collected at the same locations in Colorado between April-July 2014. The dominant bacterial phyla detected across samples agreed between sequencing methodologies. However, WGS yielded greater microbial resolution, increased accuracy, and allowed identification of more genera of bacteria, archaea, viruses, and eukaryota, and putative functional genes that would have gone undetected using 16S amplicon sequencing. NEON open data will be useful for future studies characterizing and quantifying complex ecological processes associated with changing aquatic and terrestrial ecosystems.
A synthetic-diploid benchmark for accurate variant-calling evaluation
Existing benchmark datasets for use in evaluating variant-calling accuracy are constructed from a consensus of known short-variant callers, and they are thus biased toward easy regions that are accessible by these algorithms. We derived a new benchmark dataset from the de novo PacBio assemblies of two fully homozygous human cell lines, which provides a relatively more accurate and less biased estimate of small-variant-calling error rates in a realistic context.
A standardized archaeal taxonomy for the Genome Taxonomy Database
The accrual of genomic data from both cultured and uncultured microorganisms provides new opportunities to develop systematic taxonomies based on evolutionary relationships. Previously, we established a bacterial taxonomy through the Genome Taxonomy Database. Here, we propose a standardized archaeal taxonomy that is derived from a 122-concatenated-protein phylogeny that resolves polyphyletic groups and normalizes ranks based on relative evolutionary divergence. The resulting archaeal taxonomy, which forms part of the Genome Taxonomy Database, is stable for a range of phylogenetic variables including marker gene selection, inference methods, corrections for rate heterogeneity and compositional bias, tree rooting scenarios and expansion of the genome database. Rank normalization is shown to robustly correct for substitution rates varying up to 30-fold using simulated datasets. Taxonomic curation follows the rules of the International Code of Nomenclature of Prokaryotes while taking into account proposals to formally recognize the rank of phylum and to use genome sequences as type material. This taxonomy is based on 2,392 archaeal genomes, 93.3% of which required one or more changes to their existing taxonomy, mainly owing to incomplete classification. We identify 16 archaeal phyla and reclassify 3 major monophyletic units from the former Euryarchaeota and one phylum that unites the Thaumarchaeota–Aigarchaeota–Crenarchaeota–Korarchaeota (TACK) superphylum into a single phylum. Resolving widespread incomplete and uneven archaeal classifications is achieved by a rank-normalized genome-based taxonomy.
Reuse of public genome-wide gene expression data
Key Points Over the past decade, high-throughput gene expression experiments have generated data from millions of assays. Data sets linked to publications are stored in functional genomics data archives: ArrayExpress at the European Bioinformatics Institute, Gene Expression Omnibus at the US National Center for Biotechnology Information and at the DNA Databank of Japan Omics Archive. Secondary added-value and topical databases process data from the primary archives, adding analysis and annotation to make these data accessible to every biologist by allowing queries such as 'in which tissue is a particular gene expressed?' or 'which genes are differentially expressed between a particular disease and normal samples?' Public gene expression data are commonly reused to study biological questions, both by reanalysis of primary data and by queries to secondary resources. Approximately half of the studies that use public gene expression data rely solely on existing data without adding newly generated data, and half of them use the public data in combination with new data. The reproducibility of published microarray-based studies is limited, mostly owing to insufficient experiment annotation and sometimes to unavailability of the raw or processed data. A stricter enforcement of Minimum Information About a Microarray Experiment (MIAME) requirements and also development of easy-to-use experiment annotation tools are needed to achieve a better reproducibility. Although most of the public gene expression data still are based on microarray experiments, the contribution of high-throughput-sequencing-based expression studies, known as RNA sequencing (RNA-seq), are growing rapidly. Reuse of RNA-seq data can potentially be even more valuable than reuse of microarray data, partly owing to the costs of experiments and data storage but even more importantly because of a more quantitative nature of sequencing-based expression data. Community standards such as Minimum Information about Sequencing Experiments (MINSEQE) should be adopted to make RNA-seq data maximally reusable. The bioinformatics resources that store and manage public data are sensitive to short-term funding changes, complicating the maintenance of important databases. The development of long-term infrastructure in bioinformatics, such as the ELIXIR project in Europe, is needed to ensure the long term availability of public data. A wealth of microarray gene expression data and a growing volume of RNA sequencing data are now available in public databases. The authors look at how these data are being used and discuss considerations for how such data should be analysed and deposited and how data reuse could be improved. Our understanding of gene expression has changed dramatically over the past decade, largely catalysed by technological developments. High-throughput experiments — microarrays and next-generation sequencing — have generated large amounts of genome-wide gene expression data that are collected in public archives. Added-value databases process, analyse and annotate these data further to make them accessible to every biologist. In this Review, we discuss the utility of the gene expression data that are in the public domain and how researchers are making use of these data. Reuse of public data can be very powerful, but there are many obstacles in data preparation and analysis and in the interpretation of the results. We will discuss these challenges and provide recommendations that we believe can improve the utility of such data.
Genomics: data sharing needs an international code of conduct
Efforts to protect people’s privacy in a massive international cancer project offer lessons for data sharing. Efforts to protect people’s privacy in a massive international cancer project offer lessons for data sharing. Coloured scanning electron micrograph of a migrating breast cancer cell
GISAID in crisis: can the controversial COVID genome database survive?
The most popular repository for sharing SARS-CoV-2 sequence data has come under increasing scrutiny. Scientists and funders around the world must now consider what lies ahead for the open sharing of genome data. The most popular repository for sharing SARS-CoV-2 sequence data has come under increasing scrutiny. Scientists and funders around the world must now consider what lies ahead for the open sharing of genome data.
Trust in genomic data sharing among members of the general public in the UK, USA, Canada and Australia
Trust may be important in shaping public attitudes to genetics and intentions to participate in genomics research and big data initiatives. As such, we examined trust in data sharing among the general public. A cross-sectional online survey collected responses from representative publics in the USA, Canada, UK and Australia (n = 8967). Participants were most likely to trust their medical doctor and less likely to trust other entities named. Company researchers were least likely to be trusted. Low, Variable and High Trust classes were defined using latent class analysis. Members of the High Trust class were more likely to be under 50 years, male, with children, hold religious beliefs, have personal experience of genetics and be from the USA. They were most likely to be willing to donate their genomic and health data for clinical and research uses. The Low Trust class were less reassured than other respondents by laws preventing exploitation of donated information. Variation in trust, its relation to areas of concern about the use of genomic data and potential of legislation are considered. These findings have relevance for efforts to expand genomic medicine and data sharing beyond those with personal experience of genetics or research participants.