Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
12
result(s) for
"Landolin, Jane M."
Sort by:
Resolving the complexity of the human genome using single-molecule sequencing
by
Boitano, Matthew
,
Landolin, Jane M.
,
Stamatoyannopoulos, John A.
in
45/23
,
631/208/212/748
,
631/208/726/649/2157
2015
Single-molecule, real-time DNA sequencing is used to analyse a haploid human genome (CHM1), thus closing or extending more than half of the remaining 164 euchromatic gaps in the human genome; the complete sequences of euchromatic structural variants (including inversions, complex insertions and tandem repeats) are resolved at the base-pair level, suggesting that a greater complexity of the human genome can now be accessed.
Deep-sequencing the human genome
The human genome is considered sequenced, yet more than 160 euchromatic gaps remain and many aspects of its structural variation are poorly understood. Evan Eichler and colleagues sequenced and analysed a haploid human genome (CHM1) using single-molecule, real-time (SMRT) DNA sequencing and by doing so closed — or in some cases extended — more than half of the remaining gaps. They also resolved the complete sequence of numerous euchromatic structural variants at the base-pair level, revealing inversions, complex insertions and long tracts of tandem repeats, some of them previously unknown. Thanks to the longer-read sequencing technology applied here, the complexity of the human genome that stems from variation of longer and more complex repetitive DNA can now be largely resolved.
The human genome is arguably the most complete mammalian reference assembly
1
,
2
,
3
, yet more than 160 euchromatic gaps remain
4
,
5
,
6
and aspects of its structural variation remain poorly understood ten years after its completion
7
,
8
,
9
. To identify missing sequence and genetic variation, here we sequence and analyse a haploid human genome (CHM1) using single-molecule, real-time DNA sequencing
10
. We close or extend 55% of the remaining interstitial gaps in the human GRCh37 reference genome—78% of which carried long runs of degenerate short tandem repeats, often several kilobases in length, embedded within (G+C)-rich genomic regions. We resolve the complete sequence of 26,079 euchromatic structural variants at the base-pair level, including inversions, complex insertions and long tracts of tandem repeats. Most have not been previously reported, with the greatest increases in sensitivity occurring for events less than 5 kilobases in size. Compared to the human reference, we find a significant insertional bias (3:1) in regions corresponding to complex insertions and long short tandem repeats. Our results suggest a greater complexity of the human genome in the form of variation of longer and more complex repetitive DNA that can now be largely resolved with the application of this longer-read sequencing technology.
Journal Article
Highly accurate long-read HiFi sequencing data for five complex genomes
2020
The PacBio® HiFi sequencing method yields highly accurate long-read sequencing datasets with read lengths averaging 10–25 kb and accuracies greater than 99.5%. These accurate long reads can be used to improve results for complex applications such as single nucleotide and structural variant detection, genome assembly, assembly of difficult polyploid or highly repetitive genomes, and assembly of metagenomes. Currently, there is a need for sample data sets to both evaluate the benefits of these long accurate reads as well as for development of bioinformatic tools including genome assemblers, variant callers, and haplotyping algorithms. We present deep coverage HiFi datasets for five complex samples including the two inbred model genomes Mus musculus and Zea mays, as well as two complex genomes, octoploid Fragaria × ananassa and the diploid anuran Rana muscosa. Additionally, we release sequence data from a mock metagenome community. The datasets reported here can be used without restriction to develop new algorithms and explore complex genome structure and evolution. Data were generated on the PacBio Sequel II System.Measurement(s)DNA • genome • MetagenomeTechnology Type(s)DNA sequencing • PacBio Sequel SystemFactor Type(s)organism that had its genome sequencedSample Characteristic - OrganismMus musculus • Rana muscosa • Fragaria x ananassa • Zea maysMachine-accessible metadata file describing the reported data: 10.6084/m9.figshare.12855527
Journal Article
Assembling large genomes with single-molecule sequencing and locality-sensitive hashing
by
Koren, Sergey
,
Berlin, Konstantin
,
Landolin, Jane M
in
45/23
,
631/208/2156
,
631/208/726/2001/1428
2015
An assembly algorithm that overlaps noisy long reads enables accurate and fast assembly of large genomes from single-molecule real-time sequences.
Long-read, single-molecule real-time (SMRT) sequencing is routinely used to finish microbial genomes, but available assembly methods have not scaled well to larger genomes. We introduce the MinHash Alignment Process (MHAP) for overlapping noisy, long reads using probabilistic, locality-sensitive hashing. Integrating MHAP with the Celera Assembler enabled reference-grade
de novo
assemblies of
Saccharomyces cerevisiae
,
Arabidopsis thaliana
,
Drosophila melanogaster
and a human hydatidiform mole cell line (CHM1) from SMRT sequencing. The resulting assemblies are highly continuous, include fully resolved chromosome arms and close persistent gaps in these reference genomes. Our assembly of
D. melanogaster
revealed previously unknown heterochromatic and telomeric transition sequences, and we assembled low-complexity sequences from CHM1 that fill gaps in the human GRCh38 reference. Using MHAP and the Celera Assembler, single-molecule sequencing can produce
de novo
near-complete eukaryotic assemblies that are 99.99% accurate when compared with available reference genomes.
Journal Article
Long-read, whole-genome shotgun sequence data for five model organisms
by
Babayan, Primo
,
Rapicavoli, Nicole A
,
Rank, David R
in
631/1647/334
,
631/1647/514/1948
,
631/61/212
2014
Single molecule, real-time (SMRT) sequencing from Pacific Biosciences is increasingly used in many areas of biological research including
de novo
genome assembly, structural-variant identification, haplotype phasing, mRNA isoform discovery, and base-modification analyses. High-quality, public datasets of SMRT sequences can spur development of analytic tools that can accommodate unique characteristics of SMRT data (long read lengths, lack of GC or amplification bias, and a random error profile leading to high consensus accuracy). In this paper, we describe eight high-coverage SMRT sequence datasets from five organisms (
Escherichia coli
,
Saccharomyces cerevisiae
,
Neurospora crassa
,
Arabidopsis thaliana
, and
Drosophila melanogaster)
that have been publicly released to the general scientific community (NCBI Sequence Read Archive ID SRP040522). Data were generated using two sequencing chemistries (P4C2 and P5C3) on the PacBio RS II instrument. The datasets reported here can be used without restriction by the research community to generate whole-genome assemblies, test new algorithms, investigate genome structure and evolution, and identify base modifications in some of the most widely-studied model systems in biological research.
Design Type(s)
observation design • genome sequencing • Shotgun Sequencing
Measurement Type(s)
DNA sequencing
Technology Type(s)
PacBio RS II
Factor Type(s)
Sample Characteristic(s)
Escherichia coli str. K-12 substr. MG1655 • Saccharomyces cerevisiae W303 • Neurospora crassa OR74A • Neurospora crassa • Arabidopsis thaliana • Drosophila melanogaster
Machine-accessible metadata file describing the reported data
(ISA-Tab format)
Journal Article
The developmental transcriptome of Drosophila melanogaster
by
Artieri, Carlo G.
,
Landolin, Jane M.
,
Langton, Laura
in
631/136/334/1582/715
,
631/208/212/2019
,
Alternative Splicing - genetics
2011
Drosophila melanogaster
is one of the most well studied genetic model organisms; nonetheless, its genome still contains unannotated coding and non-coding genes, transcripts, exons and RNA editing sites. Full discovery and annotation are pre-requisites for understanding how the regulation of transcription, splicing and RNA editing directs the development of this complex organism. Here we used RNA-Seq, tiling microarrays and cDNA sequencing to explore the transcriptome in 30 distinct developmental stages. We identified 111,195 new elements, including thousands of genes, coding and non-coding transcripts, exons, splicing and editing events, and inferred protein isoforms that previously eluded discovery using established experimental, prediction and conservation-based approaches. These data substantially expand the number of known transcribed elements in the
Drosophila
genome and provide a high-resolution view of transcriptome dynamics throughout development.
Elements of gene function
Three papers in this issue of
Nature
report on the modENCODE initiative, which aims to characterize functional DNA elements in the fruitfly
Drosophila melanogaster
and the roundworm
Caenorhabditis elegans
. Kharchenko
et al
. present a genome-wide chromatin landscape of the fruitfly, based on 18 histone modifications. They describe nine prevalent chromatin states. Integrating these analyses with other data types reveals individual characteristics of different genomic elements. Graveley
et al
. have used RNA-Seq, tiling microarrays and cDNA sequencing to explore the transcriptome in 30 distinct developmental stages of the fruitfly. Among the results are scores of new genes, coding and non-coding transcripts, as well as splicing and editing events. Finally, Nègre
et al
. have produced a map of the regulatory part of the fruitfly genome, defining a vast array of putative regulatory elements, such as enhancers, promoters, insulators and silencers.
As part of the modENCODE initiative, which aims to characterize functional DNA elements in
D. melanogaster
and
C. elegans
, this study uses RNA-Seq, tiling microarrays and cDNA sequencing to explore the transcriptome in 30 distinct developmental stages of the fruitfly. Among the results are scores of new genes, coding and non-coding transcripts, as well as splicing and editing events.
Journal Article
Correction: Corrigendum: Assembling large genomes with single-molecule sequencing and locality-sensitive hashing
by
Koren, Sergey
,
Berlin, Konstantin
,
Chin, Chen-Shan
in
Agriculture
,
Bioinformatics
,
Biomedical and Life Sciences
2015
Nat. Biotechnol. 33, 623–630 (2015); published online 25 May 2015; corrected after print 6 October 2015 In the version of this article initially published, equation 9 appeared incorrectly as: The equation has been corrected in the HTML and PDF versions of the article.
Journal Article
Resolving the complexity of the human genome using single-molecule sequencing
by
Huddleston, John
,
Dennis, Megan Y.
,
Malig, Maika
in
DNA sequencing
,
Genetic research
,
Human genome
2015
Single-molecule, real-time DNA sequencing is used to analyse a haploid human genome (CHM1), thus closing or extending more than half of the remaining 164 euchromatic gaps in the human genome; the complete sequences of euchromatic structural variants (including inversions, complex insertions and tandem repeats) are resolved at the base-pair level, suggesting that a greater complexity of the human genome can now be accessed.
Journal Article
Assembling Large Genomes with Single-Molecule Sequencing and Locality Sensitive Hashing
2014
We report reference-grade de novo assemblies of four model organisms and the human genome from single-molecule, real-time (SMRT) sequencing. Long-read SMRT sequencing is routinely used to finish microbial genomes, but the available assembly methods have not scaled well to larger genomes. Here we introduce the MinHash Alignment Process (MHAP) for efficient overlapping of noisy, long reads using probabilistic, locality-sensitive hashing. Together with Celera Assembler, MHAP was used to reconstruct the genomes of Escherichia coli, Saccharomyces cerevisiae, Arabidopsis thaliana, Drosophila melanogaster, and human from high-coverage SMRT sequencing. The resulting assemblies include fully resolved chromosome arms and close persistent gaps in these important reference genomes, including heterochromatic and telomeric transition sequences. For D. melanogaster, MHAP achieved a 600-fold speedup relative to prior methods and a cloud computing cost of a few hundred dollars. These results demonstrate that single-molecule sequencing alone can produce near-complete eukaryotic genomes at modest cost.