Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Language
      Language
      Clear All
      Language
  • Subject
      Subject
      Clear All
      Subject
  • Item Type
      Item Type
      Clear All
      Item Type
  • Discipline
      Discipline
      Clear All
      Discipline
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
38 result(s) for "Schliep, Alexander"
Sort by:
Compressed computations using wavelets for hidden Markov models with continuous observations
Compression as an accelerant of computation is increasingly recognized as an important component in engineering fast real-world machine learning methods for big data; c.f., its impact on genome-scale approximate string matching. Previous work showed that compression can accelerate algorithms for Hidden Markov Models (HMM) with discrete observations, both for the classical frequentist HMM algorithms—Forward Filtering, Backward Smoothing and Viterbi—and Gibbs sampling for Bayesian HMM. For Bayesian HMM with continuous-valued observations, compression was shown to greatly accelerate computations for specific types of data. For instance, data from large-scale experiments interrogating structural genetic variation can be assumed to be piece-wise constant with noise, or, equivalently, data generated by HMM with dominant self-transition probabilities. Here we extend the compressive computation approach to the classical frequentist HMM algorithms on continuous-valued observations, providing the first compressive approach for this problem. In a large-scale simulation study, we demonstrate empirically that in many settings compressed HMM algorithms very clearly outperform the classical algorithms with no, or only an insignificant effect, on the computed probabilities and infered state paths of maximal likelihood. This provides an efficient approach to big data computations with HMM. An open-source implementation of the method is available from https://github.com/lucabello/wavelet-hmms .
Compressed computations using wavelets for hidden Markov models with continuous observations
Compression as an accelerant of computation is increasingly recognized as an important component in engineering fast real-world machine learning methods for big data; c.f., its impact on genome-scale approximate string matching. Previous work showed that compression can accelerate algorithms for Hidden Markov Models (HMM) with discrete observations, both for the classical frequentist HMM algorithms-Forward Filtering, Backward Smoothing and Viterbi-and Gibbs sampling for Bayesian HMM. For Bayesian HMM with continuous-valued observations, compression was shown to greatly accelerate computations for specific types of data. For instance, data from large-scale experiments interrogating structural genetic variation can be assumed to be piece-wise constant with noise, or, equivalently, data generated by HMM with dominant self-transition probabilities. Here we extend the compressive computation approach to the classical frequentist HMM algorithms on continuous-valued observations, providing the first compressive approach for this problem. In a large-scale simulation study, we demonstrate empirically that in many settings compressed HMM algorithms very clearly outperform the classical algorithms with no, or only an insignificant effect, on the computed probabilities and infered state paths of maximal likelihood. This provides an efficient approach to big data computations with HMM. An open-source implementation of the method is available from https://github.com/lucabello/wavelet-hmms.
Fast parallel construction of variable-length Markov chains
Background Alignment-free methods are a popular approach for comparing biological sequences, including complete genomes. The methods range from probability distributions of sequence composition to first and higher-order Markov chains, where a k -th order Markov chain over DNA has 4 k formal parameters. To circumvent this exponential growth in parameters, variable-length Markov chains (VLMCs) have gained popularity for applications in molecular biology and other areas. VLMCs adapt the depth depending on sequence context and thus curtail excesses in the number of parameters. The scarcity of available fast, or even parallel software tools, prompted the development of a parallel implementation using lazy suffix trees and a hash-based alternative. Results An extensive evaluation was performed on genomes ranging from 12Mbp to 22Gbp. Relevant learning parameters were chosen guided by the Bayesian Information Criterion (BIC) to avoid over-fitting. Our implementation greatly improves upon the state-of-the-art even in serial execution. It exhibits very good parallel scaling with speed-ups for long sequences close to the optimum indicated by Amdahl’s law of 3 for 4 threads and about 6 for 16 threads, respectively. Conclusions Our parallel implementation released as open-source under the GPLv3 license provides a practically useful alternative to the state-of-the-art which allows the construction of VLMCs even for very large genomes significantly faster than previously possible. Additionally, our parameter selection based on BIC gives guidance to end-users comparing genomes.
The Global Museum: natural history collections and the future of evolutionary science and public education
Natural history museums are unique spaces for interdisciplinary research and educational innovation. Through extensive exhibits and public programming and by hosting rich communities of amateurs, students, and researchers at all stages of their careers, they can provide a place-based window to focus on integration of science and discovery, as well as a locus for community engagement. At the same time, like a synthesis radio telescope, when joined together through emerging digital resources, the global community of museums (the ‘Global Museum’) is more than the sum of its parts, allowing insights and answers to diverse biological, environmental, and societal questions at the global scale, across eons of time, and spanning vast diversity across the Tree of Life. We argue that, whereas natural history collections and museums began with a focus on describing the diversity and peculiarities of species on Earth, they are now increasingly leveraged in new ways that significantly expand their impact and relevance. These new directions include the possibility to ask new, often interdisciplinary questions in basic and applied science, such as in biomimetic design, and by contributing to solutions to climate change, global health and food security challenges. As institutions, they have long been incubators for cutting-edge research in biology while simultaneously providing core infrastructure for research on present and future societal needs. Here we explore how the intersection between pressing issues in environmental and human health and rapid technological innovation have reinforced the relevance of museum collections. We do this by providing examples as food for thought for both the broader academic community and museum scientists on the evolving role of museums. We also identify challenges to the realization of the full potential of natural history collections and the Global Museum to science and society and discuss the critical need to grow these collections. We then focus on mapping and modelling of museum data (including place-based approaches and discovery), and explore the main projects, platforms and databases enabling this growth. Finally, we aim to improve relevant protocols for the long-term storage of specimens and tissues, ensuring proper connection with tomorrow’s technologies and hence further increasing the relevance of natural history museums.
Effects of network topology on the performance of consensus and distributed learning of SVMs using ADMM
The Alternating Direction Method of Multipliers (ADMM) is a popular and promising distributed framework for solving large-scale machine learning problems. We consider decentralized consensus-based ADMM in which nodes may only communicate with one-hop neighbors. This may cause slow convergence. We investigate the impact of network topology on the performance of an ADMM-based learning of Support Vector Machine using expander, and mean-degree graphs, and additionally some of the common modern network topologies. In particular, we investigate to which degree the expansion property of the network influences the convergence in terms of iterations, training and communication time. We furthermore suggest which topology is preferable. Additionally, we provide an implementation that makes these theoretical advances easily available. The results show that the performance of decentralized ADMM-based learning of SVMs in terms of convergence is improved using graphs with large spectral gaps, higher and homogeneous degrees.
Embracing heterogeneity: coalescing the Tree of Life and the future of phylogenomics
Building the Tree of Life (ToL) is a major challenge of modern biology, requiring advances in cyberinfrastructure, data collection, theory, and more. Here, we argue that phylogenomics stands to benefit by embracing the many heterogeneous genomic signals emerging from the first decade of large-scale phylogenetic analysis spawned by high-throughput sequencing (HTS). Such signals include those most commonly encountered in phylogenomic datasets, such as incomplete lineage sorting, but also those reticulate processes emerging with greater frequency, such as recombination and introgression. Here we focus specifically on how phylogenetic methods can accommodate the heterogeneity incurred by such population genetic processes; we do not discuss phylogenetic methods that ignore such processes, such as concatenation or supermatrix approaches or supertrees. We suggest that methods of data acquisition and the types of markers used in phylogenomics will remain restricted until a posteriori methods of marker choice are made possible with routine whole-genome sequencing of taxa of interest. We discuss limitations and potential extensions of a model supporting innovation in phylogenomics today, the multispecies coalescent model (MSC). Macroevolutionary models that use phylogenies, such as character mapping, often ignore the heterogeneity on which building phylogenies increasingly rely and suggest that assimilating such heterogeneity is an important goal moving forward. Finally, we argue that an integrative cyberinfrastructure linking all steps of the process of building the ToL, from specimen acquisition in the field to publication and tracking of phylogenomic data, as well as a culture that values contributors at each step, are essential for progress.
Fast Bayesian Inference of Copy Number Variants using Hidden Markov Models with Wavelet Compression
By integrating Haar wavelets with Hidden Markov Models, we achieve drastically reduced running times for Bayesian inference using Forward-Backward Gibbs sampling. We show that this improves detection of genomic copy number variants (CNV) in array CGH experiments compared to the state-of-the-art, including standard Gibbs sampling. The method concentrates computational effort on chromosomal segments which are difficult to call, by dynamically and adaptively recomputing consecutive blocks of observations likely to share a copy number. This makes routine diagnostic use and re-analysis of legacy data collections feasible; to this end, we also propose an effective automatic prior. An open source software implementation of our method is available at http://schlieplab.org/Software/HaMMLET/ (DOI: 10.5281/zenodo.46262). This paper was selected for oral presentation at RECOMB 2016, and an abstract is published in the conference proceedings.
Predicting progression and cognitive decline in amyloid-positive patients with Alzheimer’s disease
Background In Alzheimer’s disease, amyloid- β (A β ) peptides aggregate in the lowering CSF amyloid levels - a key pathological hallmark of the disease. However, lowered CSF amyloid levels may also be present in cognitively unimpaired elderly individuals. Therefore, it is of great value to explain the variance in disease progression among patients with A β pathology. Methods A cohort of n =2293 participants, of whom n =749 were A β positive, was selected from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database to study heterogeneity in disease progression for individuals with A β pathology. The analysis used baseline clinical variables including demographics, genetic markers, and neuropsychological data to predict how the cognitive ability and AD diagnosis of subjects progressed using statistical models and machine learning. Due to the relatively low prevalence of A β pathology, models fit only to A β -positive subjects were compared to models fit to an extended cohort including subjects without established A β pathology, adjusting for covariate differences between the cohorts. Results A β pathology status was determined based on the A β 42 /A β 40 ratio. The best predictive model of change in cognitive test scores for A β -positive subjects at the 2-year follow-up achieved an R 2 score of 0.388 while the best model predicting adverse changes in diagnosis achieved a weighted F 1 score of 0.791. A β -positive subjects declined faster on average than those without A β pathology, but the specific level of CSF A β was not predictive of progression rate. When predicting cognitive score change 4 years after baseline, the best model achieved an R 2 score of 0.325 and it was found that fitting models to the extended cohort improved performance. Moreover, using all clinical variables outperformed the best model based only on a suite of cognitive test scores which achieved an R 2 score of 0.228. Conclusion Our analysis shows that CSF levels of A β are not strong predictors of the rate of cognitive decline in A β -positive subjects when adjusting for other variables. Baseline assessments of cognitive function accounts for the majority of variance explained in the prediction of 2-year decline but is insufficient for achieving optimal results in longer-term predictions. Predicting changes both in cognitive test scores and in diagnosis provides multiple perspectives of the progression of potential AD subjects.
Evolution shapes and conserves genomic signatures in viruses
The genomic signature of an organism captures the characteristics of repeated oligonucleotide patterns in its genome 1 , such as oligomer frequencies, GC content, and differences in codon usage. Viruses, however, are obligate intracellular parasites that are dependent on their host cells for replication, and information about genomic signatures in viruses has hitherto been sparse. Here, we investigate the presence and specificity of genomic signatures in 2,768 eukaryotic viral species from 105 viral families, aiming to illuminate dependencies and selective pressures in viral genome evolution. We demonstrate that most viruses have highly specific genomic signatures that often also differ significantly between species within the same family. The species-specificity is most prominent among dsDNA viruses and viruses with large genomes. We also reveal consistent dissimilarities between viral genomic signatures and those of their host cells, although some viruses present slight similarities, which may be explained by genetic adaptation to their native hosts. Our results suggest that significant evolutionary selection pressures act upon viral genomes to shape and preserve their genomic signatures, which may have implications for the field of synthetic biology in the construction of live attenuated vaccines and viral vectors. Genomic analysis of 2,768 viral species reveals conserved and distinct genome-wide differences in specific oligonucleotide patterns, so-called genomic signatures. These are likely caused by various selection pressures acting on viral genomes.
Fast MCMC sampling for hidden markov models to determine copy number variations
Background Hidden Markov Models (HMM) are often used for analyzing Comparative Genomic Hybridization (CGH) data to identify chromosomal aberrations or copy number variations by segmenting observation sequences. For efficiency reasons the parameters of a HMM are often estimated with maximum likelihood and a segmentation is obtained with the Viterbi algorithm. This introduces considerable uncertainty in the segmentation, which can be avoided with Bayesian approaches integrating out parameters using Markov Chain Monte Carlo (MCMC) sampling. While the advantages of Bayesian approaches have been clearly demonstrated, the likelihood based approaches are still preferred in practice for their lower running times; datasets coming from high-density arrays and next generation sequencing amplify these problems. Results We propose an approximate sampling technique, inspired by compression of discrete sequences in HMM computations and by kd -trees to leverage spatial relations between data points in typical data sets, to speed up the MCMC sampling. Conclusions We test our approximate sampling method on simulated and biological ArrayCGH datasets and high-density SNP arrays, and demonstrate a speed-up of 10 to 60 respectively 90 while achieving competitive results with the state-of-the art Bayesian approaches. Availability: An implementation of our method will be made available as part of the open source GHMM library from http://ghmm.org .