Catalogue Search | MBRL

High-quality genome and methylomes illustrate features underlying evolutionary success of oaks

by Zhen, Ying , Fitz-Gibbon, Sorel T. , Salzberg, Steven L. in 38/23 , 38/39 , 38/91

2022

The genus Quercus , which emerged ∼55 million years ago during globally warm temperatures, diversified into ∼450 extant species. We present a high-quality de novo genome assembly of a California endemic oak, Quercus lobata , revealing features consistent with oak evolutionary success. Effective population size remained large throughout history despite declining since early Miocene. Analysis of 39,373 mapped protein-coding genes outlined copious duplications consistent with genetic and phenotypic diversity, both by retention of genes created during the ancient γ whole genome hexaploid duplication event and by tandem duplication within families, including numerous resistance genes and a very large block of duplicated DUF247 genes, which have been found to be associated with self-incompatibility in grasses. An additional surprising finding is that subcontext-specific patterns of DNA methylation associated with transposable elements reveal broadly-distributed heterochromatin in intergenic regions, similar to grasses. Collectively, these features promote genetic and phenotypic variation that would facilitate adaptability to changing environments. The genus Quercus (oaks) has diversified into over 450 species which often play dominant roles in the ecosystems in which they occur. Here the authors present a genome and methylome for a California endemic oak, Quercus lobata , and describe features relevant to its evolutionary success.

Journal Article

Share this book

Add to My Shelf

Major data analysis errors invalidate cancer microbiome findings

by Lu, Jennifer , Gihawi, Abraham , Cooper, Colin S. in Bacteria , bioinformatics , Bladder cancer

2023

We re-analyzed the data from a recent large-scale study that reported strong correlations between DNA signatures of microbial organisms and 33 different cancer types and that created machine-learning predictors with near-perfect accuracy at distinguishing among cancers. We found at least two fundamental flaws in the reported data and in the methods: (i) errors in the genome database and the associated computational methods led to millions of false-positive findings of bacterial reads across all samples, largely because most of the sequences identified as bacteria were instead human; and (ii) errors in the transformation of the raw data created an artificial signature, even for microbes with no reads detected, tagging each tumor type with a distinct signal that the machine-learning programs then used to create an apparently accurate classifier. Each of these problems invalidates the results, leading to the conclusion that the microbiome-based classifiers for identifying cancer presented in the study are entirely wrong. These flaws have subsequently affected more than a dozen additional published studies that used the same data and whose results are likely invalid as well. Recent reports showing that human cancers have a distinctive microbiome have led to a flurry of papers describing microbial signatures of different cancer types. Many of these reports are based on flawed data that, upon re-analysis, completely overturns the original findings. The re-analysis conducted here shows that most of the microbes originally reported as associated with cancer were not present at all in the samples. The original report of a cancer microbiome and more than a dozen follow-up studies are, therefore, likely to be invalid.

Journal Article

Share this book

Add to My Shelf

A First Insight into the Genome of the Filter-Feeder Mussel Mytilus galloprovincialis

by Canchaya, Carlos , Posada, David , Murgarella, Maria in Adaptation, Biological - genetics , Animals , Aquaculture

2016

Mussels belong to the phylum Mollusca, one of the largest and most diverse taxa in the animal kingdom. Despite their importance in aquaculture and in biology in general, genomic resources from mussels are still scarce. To broaden and increase the genomic knowledge in this family, we carried out a whole-genome sequencing study of the cosmopolitan Mediterranean mussel (Mytilus galloprovincialis). We sequenced its genome (32X depth of coverage) on the Illumina platform using three pair-end libraries with different insert sizes. The large number of contigs obtained pointed out a highly complex genome of 1.6 Gb where repeated elements seem to be widespread (~30% of the genome), a feature that is also shared with other marine molluscs. Notwithstanding the limitations of our genome sequencing, we were able to reconstruct two mitochondrial genomes and predict 10,891 putative genes. A comparative analysis with other molluscs revealed a gene enrichment of gene ontology categories related to multixenobiotic resistance, glutamate biosynthetic process, and the maintenance of ciliary structures.

Journal Article

Share this book

Add to My Shelf

Chromosome-Scale Assembly of the Bread Wheat Genome Reveals Thousands of Additional Gene Copies

by Shumate, Alaina , Zimin, Aleksey V , Alonge, Michael in Agricultural research , Annotations , Assembly

2020

Abstract Bread wheat (Triticum aestivum) is a major food crop and an important plant system for agricultural genetics research. However, due to the complexity and size of its allohexaploid genome, genomic resources are limited compared to other major crops. The IWGSC recently published a reference genome and associated annotation (IWGSC CS v1.0, Chinese Spring) that has been widely adopted and utilized by the wheat community. Although this reference assembly represents all three wheat subgenomes at chromosome-scale, it was derived from short reads, and thus is missing a substantial portion of the expected 16 Gbp of genomic sequence. We earlier published an independent wheat assembly (Triticum_aestivum_3.1, Chinese Spring) that came much closer in length to the expected genome size, although it was only a contig-level assembly lacking gene annotations. Here, we describe a reference-guided effort to scaffold those contigs into chromosome-length pseudomolecules, add in any missing sequence that was unique to the IWGSC CS v1.0 assembly, and annotate the resulting pseudomolecules with genes. Our updated assembly, Triticum_aestivum_4.0, contains 15.07 Gbp of nongap sequence anchored to chromosomes, which is 1.2 Gbps more than the previous reference assembly. It includes 108,639 genes unambiguously localized to chromosomes, including over 2000 genes that were previously unplaced. We also discovered >5700 additional gene copies, facilitating the accurate annotation of functional gene duplications including at the Ppd-B1 photoperiod response locus.

Journal Article

Share this book

Add to My Shelf

CHESS 3: an improved, comprehensive catalog of human genes and transcripts based on large-scale expression data, phylogenetic analysis, and protein structure

by Erdogdu, Beril , Chao, Kuan-Hao , Minkin, Ilia in Algorithms , Animal Genetics and Genomics , Annotations

2023

CHESS 3 represents an improved human gene catalog based on nearly 10,000 RNA-seq experiments across 54 body sites. It significantly improves current genome annotation by integrating the latest reference data and algorithms, machine learning techniques for noise filtering, and new protein structure prediction methods. CHESS 3 contains 41,356 genes, including 19,839 protein-coding genes and 158,377 transcripts, with 14,863 protein-coding transcripts not in other catalogs. It includes all MANE transcripts and at least one transcript for most RefSeq and GENCODE genes. On the CHM13 human genome, the CHESS 3 catalog contains an additional 129 protein-coding genes. CHESS 3 is available at http://ccb.jhu.edu/chess .

Journal Article

Share this book

Add to My Shelf

Sequencing and Assembly of the 22-Gb Loblolly Pine Genome

by Koriabine, Maxim , Langley, Charles H , Wegrzyn, Jill L in Deoxyribonucleic acid , Evergreen trees , Genome, Plant

2014

Conifers are the predominant gymnosperm. The size and complexity of their genomes has presented formidable technical challenges for whole-genome shotgun sequencing and assembly. We employed novel strategies that allowed us to determine the loblolly pine (Pinus taeda) reference genome sequence, the largest genome assembled to date. Most of the sequence data were derived from whole-genome shotgun sequencing of a single megagametophyte, the haploid tissue of a single pine seed. Although that constrained the quantity of available DNA, the resulting haploid sequence data were well-suited for assembly. The haploid sequence was augmented with multiple linking long-fragment mate pair libraries from the parental diploid DNA. For the longest fragments, we used novel fosmid DiTag libraries. Sequences from the linking libraries that did not match the megagametophyte were identified and removed. Assembly of the sequence data were aided by condensing the enormous number of paired-end reads into a much smaller set of longer “super-reads,” rendering subsequent assembly with an overlap-based assembly algorithm computationally feasible. To further improve the contiguity and biological utility of the genome sequence, additional scaffolding methods utilizing independent genome and transcriptome assemblies were implemented. The combination of these strategies resulted in a draft genome sequence of 20.15 billion bases, with an N50 scaffold size of 66.9 kbp.

Journal Article

Share this book

Add to My Shelf

Assembly and annotation of an Ashkenazi human reference genome

by Salzberg, Steven L. , Wagner, Justin M. , Salit, Marc L. in Animal Genetics and Genomics , Annotations , Bioinformatics

2020

Background Thousands of experiments and studies use the human reference genome as a resource each year. This single reference genome, GRCh38, is a mosaic created from a small number of individuals, representing a very small sample of the human population. There is a need for reference genomes from multiple human populations to avoid potential biases. Results Here, we describe the assembly and annotation of the genome of an Ashkenazi individual and the creation of a new, population-specific human reference genome. This genome is more contiguous and more complete than GRCh38, the latest version of the human reference genome, and is annotated with highly similar gene content. The Ashkenazi reference genome, Ash1, contains 2,973,118,650 nucleotides as compared to 2,937,639,212 in GRCh38. Annotation identified 20,157 protein-coding genes, of which 19,563 are > 99% identical to their counterparts on GRCh38. Most of the remaining genes have small differences. Forty of the protein-coding genes in GRCh38 are missing from Ash1; however, all of these genes are members of multi-gene families for which Ash1 contains other copies. Eleven genes appear on different chromosomes from their homologs in GRCh38. Alignment of DNA sequences from an unrelated Ashkenazi individual to Ash1 identified ~ 1 million fewer homozygous SNPs than alignment of those same sequences to the more-distant GRCh38 genome, illustrating one of the benefits of population-specific reference genomes. Conclusions The Ash1 genome is presented as a reference for any genetic studies involving Ashkenazi Jewish individuals.

Journal Article

Share this book

Add to My Shelf

Deleterious heteroplasmic mitochondrial mutations are associated with an increased risk of overall and cancer-specific mortality

by Shi, Wen , Xie, Jiaqi , Puiu, Daniela in 631/208/721 , 631/67/1990/283 , 631/80/642/333

2023

Mitochondria carry their own circular genome and disruption of the mitochondrial genome is associated with various aging-related diseases. Unlike the nuclear genome, mitochondrial DNA (mtDNA) can be present at 1000 s to 10,000 s copies in somatic cells and variants may exist in a state of heteroplasmy, where only a fraction of the DNA molecules harbors a particular variant. We quantify mtDNA heteroplasmy in 194,871 participants in the UK Biobank and find that heteroplasmy is associated with a 1.5-fold increased risk of all-cause mortality. Additionally, we functionally characterize mtDNA single nucleotide variants (SNVs) using a constraint-based score, mitochondrial local constraint score sum (MSS) and find it associated with all-cause mortality, and with the prevalence and incidence of cancer and cancer-related mortality, particularly leukemia. These results indicate that mitochondria may have a functional role in certain cancers, and mitochondrial heteroplasmic SNVs may serve as a prognostic marker for cancer, especially for leukemia. Mitochondrial DNA is known to exhibit heterogeneity of variants, even within a single cell. Here, the authors assessed this heteroplasmy of mitochondrial DNA within the UK Biobank cohort and showed that the presence of heteroplasmy and a functional score generated from heteroplasmic SNVs were associated with all-cause mortality and certain cancers.

Journal Article

Share this book

Add to My Shelf

Sequence of the Sugar Pine Megagenome

by Holtz-Morris, Ann E , de Jong, Pieter , Koriabine, Maxim in Basidiomycota - pathogenicity , Cronartium ribicola , DNA Transposable Elements

2016

Until very recently, complete characterization of the megagenomes of conifers has remained elusive. The diploid genome of sugar pine (Pinus lambertiana Dougl.) has a highly repetitive, 31 billion bp genome. It is the largest genome sequenced and assembled to date, and the first from the subgenus Strobus, or white pines, a group that is notable for having the largest genomes among the pines. The genome represents a unique opportunity to investigate genome “obesity” in conifers and white pines. Comparative analysis of P. lambertiana and P. taeda L. reveals new insights on the conservation, age, and diversity of the highly abundant transposable elements, the primary factor determining genome size. Like most North American white pines, the principal pathogen of P. lambertiana is white pine blister rust (Cronartium ribicola J.C. Fischer ex Raben.). Identification of candidate genes for resistance to this pathogen is of great ecological importance. The genome sequence afforded us the opportunity to make substantial progress on locating the major dominant gene for simple resistance hypersensitive response, Cr1. We describe new markers and gene annotation that are both tightly linked to Cr1 in a mapping population, and associated with Cr1 in unrelated sugar pine individuals sampled throughout the species’ range, creating a solid foundation for future mapping. This genomic variation and annotated candidate genes characterized in our study of the Cr1 region are resources for future marker-assisted breeding efforts as well as for investigations of fundamental mechanisms of invasive disease and evolutionary response.

Journal Article

Share this book

Add to My Shelf

Mitochondrial heteroplasmy improves risk prediction for myeloid neoplasms

by Platz, Elizabeth A. , Guallar, Eliseo , Shi, Wen in 692/4028/67/1990 , 692/4028/67/1990/1673 , 692/53/2423

2024

Clonal hematopoiesis of indeterminate potential is the primary pathogenic risk factor for myeloid neoplasms, while heteroplasmy (mutations in a subset of cellular mitochondrial DNA) is another marker of clonal expansion associated with hematological malignancies. We explore how these two markers relate and influence myeloid neoplasms incidence, and their role in risk stratification. We find that heteroplasmy is more common in individuals with clonal hematopoiesis of indeterminate potential, particularly those with higher variant allele fractions, multiple mutations, or spliceosome machinery mutations. Individuals with both markers have a higher risk of myeloid neoplasms than those with either alone. Furthermore, heteroplasmic variants with higher predicted deleteriousness increase the risk of myeloid neoplasms. Incorporating heteroplasmy in an existing risk score model for individuals with clonal hematopoiesis of indeterminate potential significantly improves sensitivity and better identifies high-risk groups. This suggests heteroplasmy as a clonal expansion marker and potentially as a biomarker for myeloid neoplasms development. The relationship between heteroplasmy and clonal hematopoiesis of indeterminate potential and its association with the incidence of myeloid neoplasms (MN) remains to be explored. Here, the authors suggest that heteroplasmy is a marker of clonal expansion and a significant risk factor for MN development.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter