Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
32
result(s) for
"Regier, Allison"
Sort by:
Functional equivalence of genome sequencing analysis pipelines enables harmonized variant calling across human genetics projects
2018
Hundreds of thousands of human whole genome sequencing (WGS) datasets will be generated over the next few years. These data are more valuable in aggregate: joint analysis of genomes from many sources increases sample size and statistical power. A central challenge for joint analysis is that different WGS data processing pipelines cause substantial differences in variant calling in combined datasets, necessitating computationally expensive reprocessing. This approach is no longer tenable given the scale of current studies and data volumes. Here, we define WGS data processing standards that allow different groups to produce functionally equivalent (FE) results, yet still innovate on data processing pipelines. We present initial FE pipelines developed at five genome centers and show that they yield similar variant calling results and produce significantly less variability than sequencing replicates. This work alleviates a key technical bottleneck for genome aggregation and helps lay the foundation for community-wide human genetics studies.
Sharing of whole genome sequencing (WGS) data improves study scale and power, but data from different groups are often incompatible. Here, US genome centers and NIH programs define WGS data processing standards and a flexible validation method, facilitating collaboration in human genetics research.
Journal Article
Mitochondrial genome copy number measured by DNA sequencing in human blood is strongly associated with metabolic traits via cell-type composition differences
by
Freimer, Nelson
,
Kuusisto, Johanna
,
Ganel, Liron
in
Adult
,
Aged
,
Apoptosis Regulatory Proteins - genetics
2021
Background
Mitochondrial genome copy number (MT-CN) varies among humans and across tissues and is highly heritable, but its causes and consequences are not well understood. When measured by bulk DNA sequencing in blood, MT-CN may reflect a combination of the number of mitochondria per cell and cell-type composition. Here, we studied MT-CN variation in blood-derived DNA from 19184 Finnish individuals using a combination of genome (N = 4163) and exome sequencing (N = 19034) data as well as imputed genotypes (N = 17718).
Results
We identified two loci significantly associated with MT-CN variation: a common variant at the
MYB-HBS1L
locus (P = 1.6 × 10
−8
), which has previously been associated with numerous hematological parameters; and a burden of rare variants in the
TMBIM1
gene (P = 3.0 × 10
−8
), which has been reported to protect against non-alcoholic fatty liver disease. We also found that MT-CN is strongly associated with insulin levels (P = 2.0 × 10
−21
) and other metabolic syndrome (metS)-related traits. Using a Mendelian randomization framework, we show evidence that MT-CN measured in blood is causally related to insulin levels. We then applied an MT-CN polygenic risk score (PRS) derived from Finnish data to the UK Biobank, where the association between the PRS and metS traits was replicated. Adjusting for cell counts largely eliminated these signals, suggesting that MT-CN affects metS via cell-type composition.
Conclusion
These results suggest that measurements of MT-CN in blood-derived DNA partially reflect differences in cell-type composition and that these differences are causally linked to insulin and related traits.
Journal Article
Breakpoint structure of the Anopheles gambiae 2Rb chromosomal inversion
2010
Background
Alternative arrangements of chromosome 2 inversions in
Anopheles gambiae
are important sources of population structure, and are associated with adaptation to environmental heterogeneity. The forces responsible for their origin and maintenance are incompletely understood. Molecular characterization of inversion breakpoints provides insight into how they arose, and provides the basis for development of molecular karyotyping methods useful in future studies.
Methods
Sequence comparison of regions near the cytological breakpoints of 2Rb allowed the molecular delineation of breakpoint boundaries. Comparisons were made between the standard 2R
+
b
arrangement in the
An. gambiae
PEST reference genome and the inverted 2R
b
arrangements in the
An. gambiae
M and S genome assemblies. Sequence differences between alternative 2R
b
arrangements were exploited in the design of a PCR diagnostic assay, which was evaluated against the known chromosomal banding pattern of laboratory colonies and field-collected samples from Mali and Cameroon.
Results
The breakpoints of the 7.55 Mb 2R
b
inversion are flanked by extensive runs of the same short (72 bp) tandemly organized sequence, which was likely responsible for chromosomal breakage and rearrangement. Application of the molecular diagnostic assay suggested that 2R
b
has a single common origin in
An. gambiae
and its sibling species,
Anopheles arabiensis
, and also that the standard arrangement (2R
+
b
) may have arisen twice through breakpoint reuse. The molecular diagnostic was reliable when applied to laboratory colonies, but its accuracy was lower in natural populations.
Conclusions
The complex repetitive sequence flanking the 2R
b
breakpoint region may be prone to structural and sequence-level instability. The 2R
b
molecular diagnostic has immediate application in studies based on laboratory colonies, but its usefulness in natural populations awaits development of complementary molecular tools.
Journal Article
High-throughput 454 resequencing for allele discovery and recombination mapping in Plasmodium falciparum
2011
Background
Knowledge of the origins, distribution, and inheritance of variation in the malaria parasite (
Plasmodium falciparum
) genome is crucial for understanding its evolution; however the 81% (A+T) genome poses challenges to high-throughput sequencing technologies. We explore the viability of the Roche 454 Genome Sequencer FLX (GS FLX) high throughput sequencing technology for both whole genome sequencing and fine-resolution characterization of genetic exchange in malaria parasites.
Results
We present a scheme to survey recombination in the haploid stage genomes of two sibling parasite clones, using whole genome pyrosequencing that includes a sliding window approach to predict recombination breakpoints. Whole genome shotgun (WGS) sequencing generated approximately 2 million reads, with an average read length of approximately 300 bp.
De novo
assembly using a combination of WGS and 3 kb paired end libraries resulted in contigs ≤ 34 kb. More than 8,000 of the 24,599 SNP markers identified between parents were genotyped in the progeny, resulting in a marker density of approximately 1 marker/3.3 kb and allowing for the detection of previously unrecognized crossovers (COs) and many non crossover (NCO) gene conversions throughout the genome.
Conclusions
By sequencing the 23 Mb genomes of two haploid progeny clones derived from a genetic cross at more than 30× coverage, we captured high resolution information on COs, NCOs and genetic variation within the progeny genomes. This study is the first to resequence progeny clones to examine fine structure of COs and NCOs in malaria parasites.
Journal Article
A draft human pangenome reference
2023
Here the Human Pangenome Reference Consortium presents a first draft of the human pangenome reference. The pangenome contains 47 phased, diploid assemblies from a cohort of genetically diverse individuals
1
. These assemblies cover more than 99% of the expected sequence in each genome and are more than 99% accurate at the structural and base pair levels. Based on alignments of the assemblies, we generate a draft pangenome that captures known variants and haplotypes and reveals new alleles at structurally complex loci. We also add 119 million base pairs of euchromatic polymorphic sequences and 1,115 gene duplications relative to the existing reference GRCh38. Roughly 90 million of the additional base pairs are derived from structural variation. Using our draft pangenome to analyse short-read data reduced small variant discovery errors by 34% and increased the number of structural variants detected per haplotype by 104% compared with GRCh38-based workflows, which enabled the typing of the vast majority of structural variant alleles per sample.
An initial draft of the human pangenome is presented and made publicly available by the Human Pangenome Reference Consortium; the draft contains 94 de novo haplotype assemblies from 47 ancestrally diverse individuals.
Journal Article
Pangenome graph construction from genome alignments with Minigraph-Cactus
2024
Pangenome references address biases of reference genomes by storing a representative set of diverse haplotypes and their alignment, usually as a graph. Alternate alleles determined by variant callers can be used to construct pangenome graphs, but advances in long-read sequencing are leading to widely available, high-quality phased assemblies. Constructing a pangenome graph directly from assemblies, as opposed to variant calls, leverages the graph’s ability to represent variation at different scales. Here we present the Minigraph-Cactus pangenome pipeline, which creates pangenomes directly from whole-genome alignments, and demonstrate its ability to scale to 90 human haplotypes from the Human Pangenome Reference Consortium. The method builds graphs containing all forms of genetic variation while allowing use of current mapping and genotyping tools. We measure the effect of the quality and completeness of reference genomes used for analysis within the pangenomes and show that using the CHM13 reference from the Telomere-to-Telomere Consortium improves the accuracy of our methods. We also demonstrate construction of a
Drosophila melanogaster
pangenome.
Constructing genome graphs directly from genome assemblies overcomes single-reference bias.
Journal Article
Mapping and characterization of structural variation in 17,795 human genomes
2020
A key goal of whole-genome sequencing for studies of human genetics is to interrogate all forms of variation, including single-nucleotide variants, small insertion or deletion (indel) variants and structural variants. However, tools and resources for the study of structural variants have lagged behind those for smaller variants. Here we used a scalable pipeline
1
to map and characterize structural variants in 17,795 deeply sequenced human genomes. We publicly release site-frequency data to create the largest, to our knowledge, whole-genome-sequencing-based structural variant resource so far. On average, individuals carry 2.9 rare structural variants that alter coding regions; these variants affect the dosage or structure of 4.2 genes and account for 4.0–11.2% of rare high-impact coding alleles. Using a computational model, we estimate that structural variants account for 17.2% of rare alleles genome-wide, with predicted deleterious effects that are equivalent to loss-of-function coding alleles; approximately 90% of such structural variants are noncoding deletions (mean 19.1 per genome). We report 158,991 ultra-rare structural variants and show that 2% of individuals carry ultra-rare megabase-scale structural variants, nearly half of which are balanced or complex rearrangements. Finally, we infer the dosage sensitivity of genes and noncoding elements, and reveal trends that relate to element class and conservation. This work will help to guide the analysis and interpretation of structural variants in the era of whole-genome sequencing.
Structural variants in more than 17,000 human genomes are mapped and characterized using whole-genome sequencing, showing how this type of variation contributes to rare deleterious coding and noncoding alleles.
Journal Article
Semi-automated assembly of high-quality diploid human reference genomes
2022
The current human reference genome, GRCh38, represents over 20 years of effort to generate a high-quality assembly, which has benefitted society
1
,
2
. However, it still has many gaps and errors, and does not represent a biological genome as it is a blend of multiple individuals
3
,
4
. Recently, a high-quality telomere-to-telomere reference, CHM13, was generated with the latest long-read technologies, but it was derived from a hydatidiform mole cell line with a nearly homozygous genome
5
. To address these limitations, the Human Pangenome Reference Consortium formed with the goal of creating high-quality, cost-effective, diploid genome assemblies for a pangenome reference that represents human genetic diversity
6
. Here, in our first scientific report, we determined which combination of current genome sequencing and assembly approaches yield the most complete and accurate diploid genome assembly with minimal manual curation. Approaches that used highly accurate long reads and parent–child data with graph-based haplotype phasing during assembly outperformed those that did not. Developing a combination of the top-performing methods, we generated our first high-quality diploid reference assembly, containing only approximately four gaps per chromosome on average, with most chromosomes within ±1% of the length of CHM13. Nearly 48% of protein-coding genes have non-synonymous amino acid changes between haplotypes, and centromeric regions showed the highest diversity. Our findings serve as a foundation for assembling near-complete diploid human genomes at scale for a pangenome reference to capture global genetic variation from single nucleotides to structural rearrangements.
Which combination of current genome sequencing and assembly approaches results in high-quality, complete diploid genome assemblies is determined.
Journal Article
Recombination between heterologous human acrocentric chromosomes
by
de Lima, Leonardo Gomes
,
Koren, Sergey
,
Rubinstein, Boris
in
45/23
,
631/181/457/649
,
631/208/212/2304
2023
The short arms of the human acrocentric chromosomes 13, 14, 15, 21 and 22 (SAACs) share large homologous regions, including ribosomal DNA repeats and extended segmental duplications
1
,
2
. Although the resolution of these regions in the first complete assembly of a human genome—the Telomere-to-Telomere Consortium’s CHM13 assembly (T2T-CHM13)—provided a model of their homology
3
, it remained unclear whether these patterns were ancestral or maintained by ongoing recombination exchange. Here we show that acrocentric chromosomes contain pseudo-homologous regions (PHRs) indicative of recombination between non-homologous sequences. Utilizing an all-to-all comparison of the human pangenome from the Human Pangenome Reference Consortium
4
(HPRC), we find that contigs from all of the SAACs form a community. A variation graph
5
constructed from centromere-spanning acrocentric contigs indicates the presence of regions in which most contigs appear nearly identical between heterologous acrocentric chromosomes in T2T-CHM13. Except on chromosome 15, we observe faster decay of linkage disequilibrium in the pseudo-homologous regions than in the corresponding short and long arms, indicating higher rates of recombination
6
,
7
. The pseudo-homologous regions include sequences that have previously been shown to lie at the breakpoint of Robertsonian translocations
8
, and their arrangement is compatible with crossover in inverted duplications on chromosomes 13, 14 and 21. The ubiquity of signals of recombination between heterologous acrocentric chromosomes seen in the HPRC draft pangenome suggests that these shared sequences form the basis for recurrent Robertsonian translocations, providing sequence and population-based confirmation of hypotheses first developed from cytogenetic studies 50 years ago
9
.
Comparisons within the human pangenome establish that homologous regions on short arms of heterologous human acrocentric chromosomes actively recombine, leading to the high rate of Robertsonian translocation breakpoints in these regions.
Journal Article
Increased mutation and gene conversion within human segmental duplications
by
Hoekzema, Kendra
,
Munson, Katherine M.
,
Paten, Benedict
in
45/23
,
631/181/2474
,
631/208/212/2304
2023
Single-nucleotide variants (SNVs) in segmental duplications (SDs) have not been systematically assessed because of the limitations of mapping short-read sequencing data
1
,
2
. Here we constructed 1:1 unambiguous alignments spanning high-identity SDs across 102 human haplotypes and compared the pattern of SNVs between unique and duplicated regions
3
,
4
. We find that human SNVs are elevated 60% in SDs compared to unique regions and estimate that at least 23% of this increase is due to interlocus gene conversion (IGC) with up to 4.3 megabase pairs of SD sequence converted on average per human haplotype. We develop a genome-wide map of IGC donors and acceptors, including 498 acceptor and 454 donor hotspots affecting the exons of about 800 protein-coding genes. These include 171 genes that have ‘relocated’ on average 1.61 megabase pairs in a subset of human haplotypes. Using a coalescent framework, we show that SD regions are slightly evolutionarily older when compared to unique sequences, probably owing to IGC. SNVs in SDs, however, show a distinct mutational spectrum: a 27.1% increase in transversions that convert cytosine to guanine or the reverse across all triplet contexts and a 7.6% reduction in the frequency of CpG-associated mutations when compared to unique DNA. We reason that these distinct mutational properties help to maintain an overall higher GC content of SD DNA compared to that of unique DNA, probably driven by GC-biased conversion between paralogous sequences
5
,
6
.
A study comparing the pattern of single-nucleotide variation between unique and duplicated regions of the human genome shows that mutation rate and interlocus gene conversion are elevated in duplicated regions.
Journal Article