Catalogue Search | MBRL

Fast and accurate long-read assembly with wtdbg2

by Ruan, Jue , Li, Heng in 631/114/2785 , 631/114/794 , 631/1647/794

2020

Existing long-read assemblers require thousands of central processing unit hours to assemble a human genome and are being outpaced by sequencing technologies in terms of both throughput and cost. We developed a long-read assembler wtdbg2 ( https://github.com/ruanjue/wtdbg2 ) that is 2–17 times as fast as published tools while achieving comparable contiguity and accuracy. It paves the way for population-scale long-read assembly in future. Wtdbg2 assembles genomes with comparable contiguity and accuracy to existing tools using long-read sequencing data, and is several times faster, especially for large genomes.

Journal Article

Share this book

Add to My Shelf

Population sequencing enhances understanding of tea plant evolution

by Cheng, Hao , Jin, Jiqiang , Zhao, Sheng in 45/22 , 45/23 , 45/90

2020

Tea is an economically important plant characterized by a large genome, high heterozygosity, and high species diversity. In this study, we assemble a 3.26-Gb high-quality chromosome-scale genome for the ‘Longjing 43’ cultivar of Camellia sinensis var. sinensis . Genomic resequencing of 139 tea accessions from around the world is used to investigate the evolution and phylogenetic relationships of tea accessions. We find that hybridization has increased the heterozygosity and wide-ranging gene flow among tea populations with the spread of tea cultivation. Population genetic and transcriptomic analyses reveal that during domestication, selection for disease resistance and flavor in C. sinensis var. sinensis populations has been stronger than that in C. sinensis var. assamica populations. This study provides resources for marker-assisted breeding of tea and sets the foundation for further research on tea genetics and evolution. Tea is an important beverage crop with a large and heterozygous genome. Here, the authors assemble the genome of the cultivar Longjing 43 and conduct a population genetics study to reveal divergent selection for disease resistance and flavor between the two variety groups.

Journal Article

Share this book

Add to My Shelf

Penaeid shrimp genome provides insights into benthic adaptation and frequent molting

by Xu, Peng , Zhang, Hongbin , Sagi, Amir in 45/22 , 45/23 , 45/90

2019

Crustacea, the subphylum of Arthropoda which dominates the aquatic environment, is of major importance in ecology and fisheries. Here we report the genome sequence of the Pacific white shrimp Litopenaeus vannamei , covering ~1.66 Gb (scaffold N50 605.56 Kb) with 25,596 protein-coding genes and a high proportion of simple sequence repeats (>23.93%). The expansion of genes related to vision and locomotion is probably central to its benthic adaptation. Frequent molting of the shrimp may be explained by an intensified ecdysone signal pathway through gene expansion and positive selection. As an important aquaculture organism, L. vannamei has been subjected to high selection pressure during the past 30 years of breeding, and this has had a considerable impact on its genome. Decoding the L. vannamei genome not only provides an insight into the genetic underpinnings of specific biological processes, but also provides valuable information for enhancing crustacean aquaculture. The Pacific white shrimp Litopenaeus vannamei is an important aquaculture species and a promising model for crustacean biology. Here, the authors provide a reference genome assembly, and show that gene expansion is involved in the regulation of frequent molting as well as benthic adaptation of the shrimp.

Journal Article

Share this book

Add to My Shelf

NextDenovo: an efficient error correction and accurate assembly tool for noisy long reads

by Wang, Depeng , Sandoval, José R. , Hu, Jiang in Algorithms , Animal Genetics and Genomics , Bioinformatics

2024

Long-read sequencing data, particularly those derived from the Oxford Nanopore sequencing platform, tend to exhibit high error rates. Here, we present NextDenovo, an efficient error correction and assembly tool for noisy long reads, which achieves a high level of accuracy in genome assembly. We apply NextDenovo to assemble 35 diverse human genomes from around the world using Nanopore long-read data. These genomes allow us to identify the landscape of segmental duplication and gene copy number variation in modern human populations. The use of NextDenovo should pave the way for population-scale long-read assembly using Nanopore long-read data.

Journal Article

Share this book

Add to My Shelf

The nearly complete genome of Ginkgo biloba illuminates gymnosperm evolution

by Wang, Guibin , Feng, Hu , Mu, Desheng in 631/449/2491 , 631/449/2669 , Annotations

2021

Gymnosperms are a unique lineage of plants that currently lack a high-quality reference genome due to their large genome size and high repetitive sequence content. Here, we report a nearly complete genome assembly for Ginkgo biloba with a genome size of 9.87 Gb, an N50 contig size of 1.58 Mb and an N50 scaffold size of 775 Mb. We were able to accurately annotate 27,832 protein-coding genes in total, superseding the inaccurate annotation of 41,840 genes in a previous draft genome assembly. We found that expansion of the G. biloba genome, accompanied by the notable extension of introns, was mainly caused by the insertion of long terminal repeats rather than the recent occurrence of whole-genome duplication events, in contrast to the findings of a previous report. We also identified candidate genes in the central pair, intraflagellar transport and dynein protein families that are associated with the formation of the spermatophore flagellum, which has been lost in all seed plants except ginkgo and cycads. The newly obtained Ginkgo genome provides new insights into the evolution of the gymnosperm genome. Analyses on a newly assembled, nearly complete genome of Ginkgo biloba revealed the cause of genome expansion and candidate genes associated with the formation of spermatophore flagellum in ginkgo, advancing our understanding about gymnosperm evolution.

Journal Article

Share this book

Add to My Shelf

DBG2OLC: Efficient Assembly of Large Genomes Using Long Erroneous Reads of the Third Generation Sequencing Technologies

by Wu, Shigang , Ma, Zhanshan (Sam) , Ye, Chengxi in 631/114/794 , 631/61/514 , Computational Biology - methods

2016

The highly anticipated transition from next generation sequencing (NGS) to third generation sequencing (3GS) has been difficult primarily due to high error rates and excessive sequencing cost. The high error rates make the assembly of long erroneous reads of large genomes challenging because existing software solutions are often overwhelmed by error correction tasks. Here we report a hybrid assembly approach that simultaneously utilizes NGS and 3GS data to address both issues. We gain advantages from three general and basic design principles: ( i ) Compact representation of the long reads leads to efficient alignments. ( ii ) Base-level errors can be skipped; structural errors need to be detected and corrected. ( iii ) Structurally correct 3GS reads are assembled and polished. In our implementation, preassembled NGS contigs are used to derive the compact representation of the long reads, motivating an algorithmic conversion from a de Bruijn graph to an overlap graph, the two major assembly paradigms. Moreover, since NGS and 3GS data can compensate for each other, our hybrid assembly approach reduces both of their sequencing requirements. Experiments show that our software is able to assemble mammalian-sized genomes orders of magnitude more quickly than existing methods without consuming a lot of memory, while saving about half of the sequencing cost.

Journal Article

Share this book

Add to My Shelf

Genome and single-cell RNA-sequencing of the earthworm Eisenia andrei identifies cellular mechanisms underlying regeneration

by Shao, Yong , Zhu, Helen He , Wang, Xiao-Bo in 38/32 , 38/39 , 38/77

2020

The earthworm is particularly fascinating to biologists because of its strong regenerative capacity. However, many aspects of its regeneration in nature remain elusive. Here we report chromosome-level genome, large-scale transcriptome and single-cell RNA-sequencing data during earthworm ( Eisenia andrei ) regeneration. We observe expansion of LINE2 transposable elements and gene families functionally related to regeneration (for example, EGFR , epidermal growth factor receptor) particularly for genes exhibiting differential expression during earthworm regeneration. Temporal gene expression trajectories identify transcriptional regulatory factors that are potentially crucial for initiating cell proliferation and differentiation during regeneration. Furthermore, early growth response genes related to regeneration are transcriptionally activated in both the earthworm and planarian. Meanwhile, single-cell RNA-sequencing provides insight into the regenerative process at a cellular level and finds that the largest proportion of cells present during regeneration are stem cells. The mechanisms regulating regeneration of the earthworm are unclear. Here, the authors use genomic and transcriptomic analysis of the earthworm Eisenia andrei together with Hi-C analysis to identify genes involved and show activation of LINE2 transposable elements on regeneration.

Journal Article

Share this book

Add to My Shelf

A super pan-genomic landscape of rice

by Wei, Zhaoran , Gao, Hongsheng , Sun, Zongyi in 38/43 , 38/91 , 45/23

2022

Pan-genomes from large natural populations can capture genetic diversity and reveal genomic complexity. Using de novo long-read assembly, we generated a graph-based super pan-genome of rice consisting of a 251-accession panel comprising both cultivated and wild species of Asian and African rice. Our pan-genome reveals extensive structural variations (SVs) and gene presence/absence variations. Additionally, our pan-genome enables the accurate identification of nucleotide-binding leucine-rich repeat genes and characterization of their inter- and intraspecific diversity. Moreover, we uncovered grain weight-associated SVs which specify traits by affecting the expression of their nearby genes. We characterized genetic variants associated with submergence tolerance, seed shattering and plant architecture and found independent selection for a common set of genes that drove adaptation and domestication in Asian and African rice. This super pan-genome facilitates pinpointing of lineage-specific haplotypes for trait-associated genes and provides insights into the evolutionary events that have shaped the genomic architecture of various rice species.

Journal Article

Share this book

Add to My Shelf

HiTE: a fast and accurate dynamic boundary adjustment approach for full-length transposable element detection and annotation

by Gao, Xin , Li, Yaohang , Ruan, Jue in 45/43 , 631/114/2184 , 631/114/2785

2024

Recent advancements in genome assembly have greatly improved the prospects for comprehensive annotation of Transposable Elements (TEs). However, existing methods for TE annotation using genome assemblies suffer from limited accuracy and robustness, requiring extensive manual editing. In addition, the currently available gold-standard TE databases are not comprehensive, even for extensively studied species, highlighting the critical need for an automated TE detection method to supplement existing repositories. In this study, we introduce HiTE, a fast and accurate dynamic boundary adjustment approach designed to detect full-length TEs. The experimental results demonstrate that HiTE outperforms RepeatModeler2, the state-of-the-art tool, across various species. Furthermore, HiTE has identified numerous novel transposons with well-defined structures containing protein-coding domains, some of which are directly inserted within crucial genes, leading to direct alterations in gene expression. A Nextflow version of HiTE is also available, with enhanced parallelism, reproducibility, and portability. Existing methods for detecting transposable elements (TEs) in genome assemblies have limited accuracy and robustness, and the results often require extensive manual editing. Here, the authors present a fast and accurate dynamic boundary adjustment approach that improves detection and annotation of full-length TEs across various species.

Journal Article

Share this book

Add to My Shelf

KSNP: a fast de Bruijn graph-based haplotyping tool approaching data-in time cost

by Lin, Dongxiao , Zhu, Zexuan , Ruan, Jue in 631/1647/2217 , 631/61/212/2302 , Agriculture

2024

Long reads that cover more variants per read raise opportunities for accurate haplotype construction, whereas the genotype errors of single nucleotide polymorphisms pose great computational challenges for haplotyping tools. Here we introduce KSNP, an efficient haplotype construction tool based on the de Bruijn graph (DBG). KSNP leverages the ability of DBG in handling high-throughput erroneous reads to tackle the challenges. Compared to other notable tools in this field, KSNP achieves at least 5-fold speedup while producing comparable haplotype results. The time required for assembling human haplotypes is reduced to nearly the data-in time. Haplotyping is the process of distinguishing alleles inherited together on a chromosome, a crucial step in assembling and interpreting genome sequences. Here, the authors present a computationally efficient haplotype assembly tool for long read sequencing data.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter