Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
86 result(s) for "Rahmann, Sven"
Sort by:
Sustainable data analysis with Snakemake version 2; peer review: 2 approved
Data analysis often entails a multitude of heterogeneous steps, from the application of various command line tools to the usage of scripting languages like R or Python for the generation of plots and tables. It is widely recognized that data analyses should ideally be conducted in a reproducible way. Reproducibility enables technical validation and regeneration of results on the original or even new data. However, reproducibility alone is by no means sufficient to deliver an analysis that is of lasting impact (i.e., sustainable) for the field, or even just one research group. We postulate that it is equally important to ensure adaptability and transparency. The former describes the ability to modify the analysis to answer extended or slightly different research questions. The latter describes the ability to understand the analysis in order to judge whether it is not only technically, but methodologically valid. Here, we analyze the properties needed for a data analysis to become reproducible, adaptable, and transparent. We show how the popular workflow management system Snakemake can be used to guarantee this, and how it enables an ergonomic, combined, unified representation of all steps involved in data analysis, ranging from raw data processing, to quality control and fine-grained, interactive exploration and plotting of final results.
Swiftly identifying strongly unique k-mers
Motivation Short DNA sequences of length  k that appear in a single location (e.g., at a single genomic position, in a single species from a larger set of species, etc.) are called unique k-mers . They are useful for placing sequenced DNA fragments at the correct location without computing alignments and without ambiguity. However, they are not necessarily robust: A single basepair change may turn a unique k -mer into a different one that may in fact be present at one or more different locations, which may give confusing or contradictory information when attempting to place a read by its k -mer content. A more robust concept are strongly unique k -mers, i.e., unique k -mers for which no Hamming-distance-1 neighbor with conflicting information exists in all of the considered sequences. Given a set of k -mers, it is therefore of interest to have an efficient method that can distinguish k -mers with a Hamming-distance-1 neighbor in the collection from those that do not. Results We present engineered algorithms to identify and mark within a set K of (canonical) k -mers all elements that have a Hamming-distance-1 neighbor in the same set. One algorithm is based on recursively running a 4-way comparison on sub-intervals of the sorted set. The other algorithm is based on bucketing and running a pairwise bit-parallel Hamming distance test on small buckets of the sorted set. Both methods consider canonical k -mers (i.e., taking reverse complements into account) and allow for efficient parallelization. The methods have been implemented and applied in practice to sets consisting of several billions of k -mers. An optimized combined approach running with 16 threads on a 16-core workstation yields wall times below 20 seconds on the 2.5 billion distinct 31-mers of the human telomere-to-telomere reference genome. Availability An implementation can be found at https://gitlab.com/rahmannlab/strong-k-mers .
Fast lightweight accurate xenograft sorting
Motivation With an increasing number of patient-derived xenograft (PDX) models being created and subsequently sequenced to study tumor heterogeneity and to guide therapy decisions, there is a similarly increasing need for methods to separate reads originating from the graft (human) tumor and reads originating from the host species’ (mouse) surrounding tissue. Two kinds of methods are in use: On the one hand, alignment-based tools require that reads are mapped and aligned (by an external mapper/aligner) to the host and graft genomes separately first; the tool itself then processes the resulting alignments and quality metrics (typically BAM files) to assign each read or read pair. On the other hand, alignment-free tools work directly on the raw read data (typically FASTQ files). Recent studies compare different approaches and tools, with varying results. Results We show that alignment-free methods for xenograft sorting are superior concerning CPU time usage and equivalent in accuracy. We improve upon the state of the art sorting by presenting a fast lightweight approach based on three-way bucketed quotiented Cuckoo hashing. Our hash table requires memory comparable to an FM index typically used for read alignment and less than other alignment-free approaches. It allows extremely fast lookups and uses less CPU time than other alignment-free methods and alignment-based methods at similar accuracy. Several engineering steps (e.g., shortcuts for unsuccessful lookups, software prefetching) improve the performance even further. Availability Our software xengsort is available under the MIT license at http://gitlab.com/genomeinformatics/xengsort . It is written in numba-compiled Python and comes with sample Snakemake workflows for hash table construction and dataset processing.
N6-Adenosine Methylation in MiRNAs
Methylation of N6-adenosine (m6A) has been observed in many different classes of RNA, but its prevalence in microRNAs (miRNAs) has not yet been studied. Here we show that a knockdown of the m6A demethylase FTO affects the steady-state levels of several miRNAs. Moreover, RNA immunoprecipitation with an anti-m6A-antibody followed by RNA-seq revealed that a significant fraction of miRNAs contains m6A. By motif searches we have discovered consensus sequences discriminating between methylated and unmethylated miRNAs. The epigenetic modification of an epigenetic modifier as described here adds a new layer to the complexity of the posttranscriptional regulation of gene expression.
Epitope similarity cannot explain the pre-formed T cell immunity towards structural SARS-CoV-2 proteins
The current pandemic is caused by the SARS-CoV-2 virus and large progress in understanding the pathology of the virus has been made since its emergence in late 2019. Several reports indicate short lasting immunity against endemic coronaviruses, which contrasts studies showing that biobanked venous blood contains T cells reactive to SARS-CoV-2 S-protein even before the outbreak in Wuhan. This suggests a preformed T cell memory towards structural proteins in individuals not exposed to SARS-CoV-2. Given the similarity of SARS-CoV-2 to other members of the Coronaviridae family, the endemic coronaviruses appear likely candidates to generate this T cell memory. However, given the apparent poor immunological memory created by the endemic coronaviruses, immunity against other common pathogens might offer an alternative explanation. Here, we utilize a combination of epitope prediction and similarity to common human pathogens to identify potential sources of the SARS-CoV-2 T cell memory. Although beta-coronaviruses are the most likely candidates to explain the pre-existing SARS-CoV-2 reactive T cells in uninfected individuals, the SARS-CoV-2 epitopes with the highest similarity to those from beta-coronaviruses are confined to replication associated proteins—not the host interacting S-protein. Thus, our study suggests that the observed SARS-CoV-2 pre-formed immunity to structural proteins is not driven by near-identical epitopes.
wg-blimp: an end-to-end analysis pipeline for whole genome bisulfite sequencing data
Background Analysing whole genome bisulfite sequencing datasets is a data-intensive task that requires comprehensive and reproducible workflows to generate valid results. While many algorithms have been developed for tasks such as alignment, comprehensive end-to-end pipelines are still sparse. Furthermore, previous pipelines lack features or show technical deficiencies, thus impeding analyses. Results We developed wg-blimp ( w hole g enome b isu l f i te sequencing m ethylation analysis p ipeline) as an end-to-end pipeline to ease whole genome bisulfite sequencing data analysis. It integrates established algorithms for alignment, quality control, methylation calling, detection of differentially methylated regions, and methylome segmentation, requiring only a reference genome and raw sequencing data as input. Comparing wg-blimp to previous end-to-end pipelines reveals similar setups for common sequence processing tasks, but shows differences for post-alignment analyses. We improve on previous pipelines by providing a more comprehensive analysis workflow as well as an interactive user interface. To demonstrate wg-blimp’s ability to produce correct results we used it to call differentially methylated regions for two publicly available datasets. We were able to replicate 112 of 114 previously published regions, and found results to be consistent with previous findings. We further applied wg-blimp to a publicly available sample of embryonic stem cells to showcase methylome segmentation. As expected, unmethylated regions were in close proximity of transcription start sites. Segmentation results were consistent with previous analyses, despite different reference genomes and sequencing techniques. Conclusions wg-blimp provides a comprehensive analysis pipeline for whole genome bisulfite sequencing data as well as a user interface for simplified result inspection. We demonstrated its applicability by analysing multiple publicly available datasets. Thus, wg-blimp is a relevant alternative to previous analysis pipelines and may facilitate future epigenetic research.
A hybrid parameter estimation algorithm for beta mixtures and applications to methylation state classification
Background Mixtures of beta distributions are a flexible tool for modeling data with values on the unit interval, such as methylation levels. However, maximum likelihood parameter estimation with beta distributions suffers from problems because of singularities in the log-likelihood function if some observations take the values 0 or 1. Methods While ad-hoc corrections have been proposed to mitigate this problem, we propose a different approach to parameter estimation for beta mixtures where such problems do not arise in the first place. Our algorithm combines latent variables with the method of moments instead of maximum likelihood, which has computational advantages over the popular EM algorithm. Results As an application, we demonstrate that methylation state classification is more accurate when using adaptive thresholds from beta mixtures than non-adaptive thresholds on observed methylation levels. We also demonstrate that we can accurately infer the number of mixture components. Conclusions The hybrid algorithm between likelihood-based component un-mixing and moment-based parameter estimation is a robust and efficient method for beta mixture estimation. We provide an implementation of the method (“betamix”) as open source software under the MIT license.
Evolution of heterotrophy in chrysophytes as reflected by comparative transcriptomics
Shifts in the nutritional mode between phototrophy, mixotrophy and heterotrophy are a widespread phenomenon in the evolution of eukaryotic diversity. The transition between nutritional modes is particularly pronounced in chrysophytes and occurred independently several times through parallel evolution. Thus, chrysophytes provide a unique opportunity for studying the molecular basis of nutritional diversification and of the accompanying pathway reduction and degradation of plastid structures. In order to analyze the succession in switching the nutritional mode from mixotrophy to heterotrophy, we compared the transcriptome of the mixotrophic Poterioochromonas malhamensis with the transcriptomes of three obligate heterotrophic species of Ochromonadales. We used the transcriptome of P. malhamensis as a reference for plastid reduction in the heterotrophic taxa. The analyzed heterotrophic taxa were in different stages of plastid reduction. We investigated the reduction of several photosynthesis related pathways e.g. the xanthophyll cycle, the mevalonate pathway, the shikimate pathway and the tryptophan biosynthesis as well as the reduction of plastid structures and postulate a presumable succession of pathway reduction and degradation of accompanying structures.
Effects of Silver Nitrate and Silver Nanoparticles on a Planktonic Community: General Trends after Short-Term Exposure
Among metal pollutants silver ions are one of the most toxic forms, and have thus been assigned to the highest toxicity class. Its toxicity to a wide range of microorganisms combined with its low toxicity to humans lead to the development of a wealth of silver-based products in many bactericidal applications accounting to more than 1000 nano-technology-based consumer products. Accordingly, silver is a widely distributed metal in the environment originating from its different forms of application as metal, salt and nanoparticle. A realistic assessment of silver nanoparticle toxicity in natural waters is, however, problematic and needs to be linked to experimental approaches. Here we apply metatranscriptome sequencing allowing for elucidating reactions of whole communities present in a water sample to stressors. We compared the toxicity of ionic silver and ligand-free silver nanoparticles by short term exposure on a natural community of aquatic microorganisms. We analyzed the effects of the treatments on metabolic pathways and species composition on the eukaryote metatranscriptome level in order to describe immediate molecular responses of organisms using a community approach. We found significant differences between the samples treated with 5 µg/L AgNO3 compared to the controls, but no significant differences in the samples treated with AgNP compared to the control samples. Statistical analysis yielded 126 genes (KO-IDs) with significant differential expression with a false discovery rate (FDR) <0.05 between the control (KO) and AgNO3 (NO3) groups. A KEGG pathway enrichment analysis showed significant results with a FDR below 0.05 for pathways related to photosynthesis. Our study therefore supports the view that ionic silver rather than silver nanoparticles are responsible for silver toxicity. Nevertheless, our results highlight the strength of metatranscriptome approaches for assessing metal toxicity on aquatic communities.