Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
121 result(s) for "Pedersen, Brent S."
Sort by:
A spectrum of free software tools for processing the VCF variant call format: vcflib, bio-vcf, cyvcf2, hts-nim and slivar
Since its introduction in 2011 the variant call format (VCF) has been widely adopted for processing DNA and RNA variants in practically all population studies—as well as in somatic and germline mutation studies. The VCF format can represent single nucleotide variants, multi-nucleotide variants, insertions and deletions, and simple structural variants called and anchored against a reference genome. Here we present a spectrum of over 125 useful, complimentary free and open source software tools and libraries, we wrote and made available through the multiple vcflib , bio-vcf , cyvcf2 , hts-nim and slivar projects. These tools are applied for comparison, filtering, normalisation, smoothing and annotation of VCF, as well as output of statistics, visualisation, and transformations of files variants. These tools run everyday in critical biomedical pipelines and countless shell scripts. Our tools are part of the wider bioinformatics ecosystem and we highlight best practices. We shortly discuss the design of VCF, lessons learnt, and how we can address more complex variation through pangenome graph formats, variation that can not easily be represented by the VCF format.
A map of constrained coding regions in the human genome
Deep catalogs of genetic variation from thousands of humans enable the detection of intraspecies constraint by identifying coding regions with a scarcity of variation. While existing techniques summarize constraint for entire genes, single gene-wide metrics conceal regional constraint variability within each gene. Therefore, we have created a detailed map of constrained coding regions (CCRs) by leveraging variation observed among 123,136 humans from the Genome Aggregation Database. The most constrained CCRs are enriched for pathogenic variants in ClinVar and mutations underlying developmental disorders. CCRs highlight protein domain families under high constraint and suggest unannotated or incomplete protein domains. The highest-percentile CCRs complement existing variant prioritization methods when evaluating de novo mutations in studies of autosomal dominant disease. Finally, we identify highly constrained CCRs within genes lacking known disease associations. This observation suggests that CCRs may identify regions under strong purifying selection that, when mutated, cause severe developmental phenotypes or embryonic lethality. This study leverages coding variation observed among 123,136 individuals to create a detailed map of constrained coding regions in the human genome. This map may help identify critical regions within genes that, when mutated, cause embryonic lethality or severe developmental phenotypes.
GIGGLE: a search engine for large-scale integrated genome analysis
GIGGLE is a genomics search engine that identifies and ranks the significance of genomic loci shared between query features and thousands of genome interval files. GIGGLE (https://github.com/ryanlayer/giggle) scales to billions of intervals and is over three orders of magnitude faster than existing methods. Its speed extends the accessibility and utility of resources such as ENCODE, Roadmap Epigenomics, and GTEx by facilitating data integration and hypothesis generation.
Samplot: a platform for structural variant visual validation and automated filtering
Visual validation is an important step to minimize false-positive predictions from structural variant (SV) detection. We present Samplot, a tool for creating images that display the read depth and sequence alignments necessary to adjudicate purported SVs across samples and sequencing technologies. These images can be rapidly reviewed to curate large SV call sets. Samplot is applicable to many biological problems such as SV prioritization in disease studies, analysis of inherited variation, or de novo SV review. Samplot includes a machine learning package that dramatically decreases the number of false positives without human review. Samplot is available at https://github.com/ryanlayer/samplot .
Large, three-generation human families reveal post-zygotic mosaicism and variability in germline mutation accumulation
The number of de novo mutations (DNMs) found in an offspring's genome increases with both paternal and maternal age. But does the rate of mutation accumulation in human gametes differ across families? Using sequencing data from 33 large, three-generation CEPH families, we observed significant variability in parental age effects on DNM counts across families, ranging from 0.19 to 3.24 DNMs per year. Additionally, we found that ~3% of DNMs originated following primordial germ cell specification in a parent, and differed from non-mosaic germline DNMs in their mutational spectra. We also discovered that nearly 10% of candidate DNMs in the second generation were post-zygotic, and present in both somatic and germ cells; these gonosomal mutations occurred at equivalent frequencies on both parental haplotypes. Our results demonstrate that rates of germline mutation accumulation vary among families with similar ancestry, and confirm that post-zygotic mosaicism is a substantial source of human DNM. Humans receive half of their DNA from each of their parents. However, this inherited DNA is not identical to the corresponding half of the parents’ genetic material. Instead, both the egg and the sperm that combine to generate an embryo carry so-called ‘germline de novo’ mutations that are not present in the rest of the parents’ cells. Although these de novo mutations are an important source of genetic diversity, they can also cause disease. Geneticists have a longstanding interest in how, when and at what rate germline de novo mutations arise. These questions are commonly addressed by analyzing the DNA of large cohorts of two-generation families. Now, Sasani et al. have used the genetic data of 33 families in Utah, United States, which all span three generations, to determine the rate at which de novo mutations appear. The analysis revealed that, on average, each person has around 70 de novo mutations that were not present in their parent’s genetic code. Sasani et al. also found that sperm and egg cells from older parents typically contain more de novo mutations. However, this effect varied substantially across the Utah families. In some families, an increase of one year in the parents’ age resulted in over three extra de novo mutations in their children. In others, the number of new mutations barely increased at all. In addition, Sasani et al. found that almost 10% of de novo mutations do not occur in the parents’ sperm or eggs, but happen in the embryo very soon after fertilization. These mutations can lead to ‘mosaicism’, resulting in a person having a mutation in some, but not all of their organs and tissues. In some cases, this could cause an unknown number of sperm and egg cells to carry a mutation that others do not. This makes it hard to predict how likely two or more siblings are to inherit the mutation. This analysis reveals that parental age affects the number of de novo mutations in children, but this effect changes from family to family. This finding could point to genetic or environmental factors that alter the human mutation rate.
STRling: a k-mer counting approach that detects short tandem repeat expansions at known and novel loci
Expansions of short tandem repeats (STRs) cause many rare diseases. Expansion detection is challenging with short-read DNA sequencing data since supporting reads are often mapped incorrectly. Detection is particularly difficult for “novel” STRs, which include new motifs at known loci or STRs absent from the reference genome. We developed STRling to efficiently count k-mers to recover informative reads and call expansions at known and novel STR loci. STRling is sensitive to known STR disease loci, has a low false discovery rate, and resolves novel STR expansions to base-pair position accuracy. It is fast, scalable, open-source, and available at: github.com/quinlan-lab/STRling .
Combining genetic constraint with predictions of alternative splicing to prioritize deleterious splicing in rare disease studies
Background Despite numerous molecular and computational advances, roughly half of patients with a rare disease remain undiagnosed after exome or genome sequencing. A particularly challenging barrier to diagnosis is identifying variants that cause deleterious alternative splicing at intronic or exonic loci outside of canonical donor or acceptor splice sites. Results Several existing tools predict the likelihood that a genetic variant causes alternative splicing. We sought to extend such methods by developing a new metric that aids in discerning whether a genetic variant leads to deleterious alternative splicing. Our metric combines genetic variation in the Genome Aggregate Database with alternative splicing predictions from SpliceAI to compare observed and expected levels of splice-altering genetic variation. We infer genic regions with significantly less splice-altering variation than expected to be constrained. The resulting model of regional splicing constraint captures differential splicing constraint across gene and exon categories, and the most constrained genic regions are enriched for pathogenic splice-altering variants. Building from this model, we developed ConSpliceML. This ensemble machine learning approach combines regional splicing constraint with multiple per-nucleotide alternative splicing scores to guide the prediction of deleterious splicing variants in protein-coding genes. ConSpliceML more accurately distinguishes deleterious and benign splicing variants than state-of-the-art splicing prediction methods, especially in “cryptic” splicing regions beyond canonical donor or acceptor splice sites. Conclusion Integrating a model of genetic constraint with annotations from existing alternative splicing tools allows ConSpliceML to prioritize potentially deleterious splice-altering variants in studies of rare human diseases.
Somalier: rapid relatedness estimation for cancer and germline studies using efficient genome sketches
Background When interpreting sequencing data from multiple spatial or longitudinal biopsies, detecting sample mix-ups is essential, yet more difficult than in studies of germline variation. In most genomic studies of tumors, genetic variation is detected through pairwise comparisons of the tumor and a matched normal tissue from the sample donor. In many cases, only somatic variants are reported, which hinders the use of existing tools that detect sample swaps solely based on genotypes of inherited variants. To address this problem, we have developed Somalier, a tool that operates directly on alignments and does not require jointly called germline variants. Instead, Somalier extracts a small sketch of informative genetic variation for each sample. Sketches from hundreds of germline or somatic samples can then be compared in under a second, making Somalier a useful tool for measuring relatedness in large cohorts. Somalier produces both text output and an interactive visual report that facilitates the detection and correction of sample swaps using multiple relatedness metrics. Results We introduce the tool and demonstrate its utility on a cohort of five glioma samples each with a normal, tumor, and cell-free DNA sample. Applying Somalier to high-coverage sequence data from the 1000 Genomes Project also identifies several related samples. We also demonstrate that it can distinguish pairs of whole-genome and RNA-seq samples from the same individuals in the Genotype-Tissue Expression (GTEx) project. Conclusions Somalier is a tool that can rapidly evaluate relatedness from sequencing data. It can be applied to diverse sequencing data types and genome builds and is available under an MIT license at github.com/brentp/somalier .
Following Tetraploidy in Maize, a Short Deletion Mechanism Removed Genes Preferentially from One of the Two Homeologs
Previous work in Arabidopsis showed that after an ancient tetraploidy event, genes were preferentially removed from one of the two homeologs, a process known as fractionation. The mechanism of fractionation is unknown. We sought to determine whether such preferential, or biased, fractionation exists in maize and, if so, whether a specific mechanism could be implicated in this process. We studied the process of fractionation using two recently sequenced grass species: sorghum and maize. The maize lineage has experienced a tetraploidy since its divergence from sorghum approximately 12 million years ago, and fragments of many knocked-out genes retain enough sequence similarity to be easily identifiable. Using sorghum exons as the query sequence, we studied the fate of both orthologous genes in maize following the maize tetraploidy. We show that genes are predominantly lost, not relocated, and that single-gene loss by deletion is the rule. Based on comparisons with orthologous sorghum and rice genes, we also infer that the sequences present before the deletion events were flanked by short direct repeats, a signature of intra-chromosomal recombination. Evidence of this deletion mechanism is found 2.3 times more frequently on one of the maize homeologs, consistent with earlier observations of biased fractionation. The over-fractionated homeolog is also a greater than 3-fold better target for transposon removal, but does not have an observably higher synonymous base substitution rate, nor could we find differentially placed methylation domains. We conclude that fractionation is indeed biased in maize and that intrachromosomal or possibly a similar illegitimate recombination is the primary mechanism by which fractionation occurs. The mechanism of intra-chromosomal recombination explains the observed bias in both gene and transposon loss in the maize lineage. The existence of fractionation bias demonstrates that the frequency of deletion is modulated. Among the evolutionary benefits of this deletion/fractionation mechanism is bulk DNA removal and the generation of novel combinations of regulatory sequences and coding regions.
Relationship of DNA Methylation and Gene Expression in Idiopathic Pulmonary Fibrosis
Idiopathic pulmonary fibrosis (IPF) is an untreatable and often fatal lung disease that is increasing in prevalence and is caused by complex interactions between genetic and environmental factors. Epigenetic mechanisms control gene expression and are likely to regulate the IPF transcriptome. To identify methylation marks that modify gene expression in IPF lung. We assessed DNA methylation (comprehensive high-throughput arrays for relative methylation arrays [CHARM]) and gene expression (Agilent gene expression arrays) in 94 patients with IPF and 67 control subjects, and performed integrative genomic analyses to define methylation-gene expression relationships in IPF lung. We validated methylation changes by a targeted analysis (Epityper), and performed functional validation of one of the genes identified by our analysis. We identified 2,130 differentially methylated regions (DMRs; <5% false discovery rate), of which 738 are associated with significant changes in gene expression and enriched for expected inverse relationship between methylation and expression (P < 2.2 × 10(-16)). We validated 13/15 DMRs by targeted analysis of methylation. Methylation-expression quantitative trait loci (methyl-eQTL) identified methylation marks that control cis and trans gene expression, with an enrichment for cis relationships (P < 2.2 × 10(-16)). We found five trans methyl-eQTLs where a methylation change at a single DMR is associated with transcriptional changes in a substantial number of genes; four of these DMRs are near transcription factors (castor zinc finger 1 [CASZ1], FOXC1, MXD4, and ZDHHC4). We studied the in vitro effects of change in CASZ1 expression and validated its role in regulation of target genes in the methyl-eQTL. These results suggest that DNA methylation may be involved in the pathogenesis of IPF.