Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Reading Level
      Reading Level
      Clear All
      Reading Level
  • Content Type
      Content Type
      Clear All
      Content Type
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
      More Filters
      Clear All
      More Filters
      Item Type
    • Is Full-Text Available
    • Subject
    • Publisher
    • Source
    • Donor
    • Language
    • Place of Publication
    • Contributors
    • Location
13,716 result(s) for "Biology Research Data processing."
Sort by:
mixOmics: An R package for ‘omics feature selection and multiple data integration
The advent of high throughput technologies has led to a wealth of publicly available 'omics data coming from different sources, such as transcriptomics, proteomics, metabolomics. Combining such large-scale biological data sets can lead to the discovery of important biological insights, provided that relevant information can be extracted in a holistic manner. Current statistical approaches have been focusing on identifying small subsets of molecules (a 'molecular signature') to explain or predict biological conditions, but mainly for a single type of 'omics. In addition, commonly used methods are univariate and consider each biological feature independently. We introduce mixOmics, an R package dedicated to the multivariate analysis of biological data sets with a specific focus on data exploration, dimension reduction and visualisation. By adopting a systems biology approach, the toolkit provides a wide range of methods that statistically integrate several data sets at once to probe relationships between heterogeneous 'omics data sets. Our recent methods extend Projection to Latent Structure (PLS) models for discriminant analysis, for data integration across multiple 'omics data or across independent studies, and for the identification of molecular signatures. We illustrate our latest mixOmics integrative frameworks for the multivariate analyses of 'omics data available from the package.
Modeling aspects of the language of life through transfer-learning protein sequences
Background Predicting protein function and structure from sequence is one important challenge for computational biology. For 26 years, most state-of-the-art approaches combined machine learning and evolutionary information. However, for some applications retrieving related proteins is becoming too time-consuming. Additionally, evolutionary information is less powerful for small families, e.g. for proteins from the Dark Proteome . Both these problems are addressed by the new methodology introduced here. Results We introduced a novel way to represent protein sequences as continuous vectors ( embeddings ) by using the language model ELMo taken from natural language processing. By modeling protein sequences, ELMo effectively captured the biophysical properties of the language of life from unlabeled big data (UniRef50). We refer to these new embeddings as SeqVec ( Seq uence-to- Vec tor) and demonstrate their effectiveness by training simple neural networks for two different tasks. At the per-residue level, secondary structure (Q3 = 79% ± 1, Q8 = 68% ± 1) and regions with intrinsic disorder (MCC = 0.59 ± 0.03) were predicted significantly better than through one-hot encoding or through Word2vec-like approaches. At the per-protein level, subcellular localization was predicted in ten classes (Q10 = 68% ± 1) and membrane-bound were distinguished from water-soluble proteins (Q2 = 87% ± 1). Although SeqVec embeddings generated the best predictions from single sequences, no solution improved over the best existing method using evolutionary information. Nevertheless, our approach improved over some popular methods using evolutionary information and for some proteins even did beat the best. Thus, they prove to condense the underlying principles of protein sequences. Overall, the important novelty is speed: where the lightning-fast HHblits needed on average about two minutes to generate the evolutionary information for a target protein, SeqVec created embeddings on average in 0.03 s. As this speed-up is independent of the size of growing sequence databases, SeqVec provides a highly scalable approach for the analysis of big data in proteomics, i.e. microbiome or metaproteome analysis. Conclusion Transfer-learning succeeded to extract information from unlabeled sequence databases relevant for various protein prediction tasks. SeqVec modeled the language of life, namely the principles underlying protein sequences better than any features suggested by textbooks and prediction methods. The exception is evolutionary information, however, that information is not available on the level of a single sequence.
BIDS apps: Improving ease of use, accessibility, and reproducibility of neuroimaging data analysis methods
The rate of progress in human neurosciences is limited by the inability to easily apply a wide range of analysis methods to the plethora of different datasets acquired in labs around the world. In this work, we introduce a framework for creating, testing, versioning and archiving portable applications for analyzing neuroimaging data organized and described in compliance with the Brain Imaging Data Structure (BIDS). The portability of these applications (BIDS Apps) is achieved by using container technologies that encapsulate all binary and other dependencies in one convenient package. BIDS Apps run on all three major operating systems with no need for complex setup and configuration and thanks to the comprehensiveness of the BIDS standard they require little manual user input. Previous containerized data processing solutions were limited to single user environments and not compatible with most multi-tenant High Performance Computing systems. BIDS Apps overcome this limitation by taking advantage of the Singularity container technology. As a proof of concept, this work is accompanied by 22 ready to use BIDS Apps, packaging a diverse set of commonly used neuroimaging algorithms.
MUMmer4: A fast and versatile genome alignment system
The MUMmer system and the genome sequence aligner nucmer included within it are among the most widely used alignment packages in genomics. Since the last major release of MUMmer version 3 in 2004, it has been applied to many types of problems including aligning whole genome sequences, aligning reads to a reference genome, and comparing different assemblies of the same genome. Despite its broad utility, MUMmer3 has limitations that can make it difficult to use for large genomes and for the very large sequence data sets that are common today. In this paper we describe MUMmer4, a substantially improved version of MUMmer that addresses genome size constraints by changing the 32-bit suffix tree data structure at the core of MUMmer to a 48-bit suffix array, and that offers improved speed through parallel processing of input query sequences. With a theoretical limit on the input size of 141Tbp, MUMmer4 can now work with input sequences of any biologically realistic length. We show that as a result of these enhancements, the nucmer program in MUMmer4 is easily able to handle alignments of large genomes; we illustrate this with an alignment of the human and chimpanzee genomes, which allows us to compute that the two species are 98% identical across 96% of their length. With the enhancements described here, MUMmer4 can also be used to efficiently align reads to reference genomes, although it is less sensitive and accurate than the dedicated read aligners. The nucmer aligner in MUMmer4 can now be called from scripting languages such as Perl, Python and Ruby. These improvements make MUMer4 one the most versatile genome alignment packages available.
A population-based phenome-wide association study of cardiac and aortic structure and function
Differences in cardiac and aortic structure and function are associated with cardiovascular diseases and a wide range of other types of disease. Here we analyzed cardiovascular magnetic resonance images from a population-based study, the UK Biobank, using an automated machine-learning-based analysis pipeline. We report a comprehensive range of structural and functional phenotypes for the heart and aorta across 26,893 participants, and explore how these phenotypes vary according to sex, age and major cardiovascular risk factors. We extended this analysis with a phenome-wide association study, in which we tested for correlations of a wide range of non-imaging phenotypes of the participants with imaging phenotypes. We further explored the associations of imaging phenotypes with early-life factors, mental health and cognitive function using both observational analysis and Mendelian randomization. Our study illustrates how population-based cardiac and aortic imaging phenotypes can be used to better define cardiovascular disease risks as well as heart–brain health interactions, highlighting new opportunities for studying disease mechanisms and developing image-based biomarkers. Using magnetic resonance images of the heart and aorta from 26,893 individuals in the UK Biobank, a phenome-wide association study associates cardiovascular imaging phenotypes with a wide range of demographic, lifestyle and clinical features.
Ten quick tips for effective dimensionality reduction
Both a means of denoising and simplification, it can be beneficial for the majority of modern biological datasets, in which it’s not uncommon to have hundreds or even millions of simultaneous measurements collected for a single sample. Because of “the curse of dimensionality,” many statistical methods lack power when applied to high-dimensional data. Formally, the Marchenko–Pastur distribution asymptotically models the distribution of the singular values of large random matrices. [...]for datasets large in both the number of observations and features, you use a rule of retaining only eigenvalues outside the support of the fitted Marchenko–Pastur distribution; however, remember that this applies only when your data have at least thousands of samples and thousands of features. [...]the height-to-width ratio of a PCA plot should be consistent with the ratio between the corresponding eigenvalues. Because eigenvalues reflect the variance in coordinates of the associated PCs, you only need to ensure that in the plots, one \"unit\" in direction of one PC has the same length as one \"unit\" in direction of another PC. Because batch effects can confound the signal of interest, it is a good practice to check for their presence and, if found, to remove them before proceeding with further downstream analysis.
Evolution of Translational Omics
Technologies collectively called omics enable simultaneous measurement of an enormous number of biomolecules; for example, genomics investigates thousands of DNA sequences, and proteomics examines large numbers of proteins. Scientists are using these technologies to develop innovative tests to detect disease and to predict a patient's likelihood of responding to specific drugs. Following a recent case involving premature use of omics-based tests in cancer clinical trials at Duke University, the NCI requested that the IOM establish a committee to recommend ways to strengthen omics-based test development and evaluation. This report identifies best practices to enhance development, evaluation, and translation of omics-based tests while simultaneously reinforcing steps to ensure that these tests are appropriately assessed for scientific validity before they are used to guide patient treatment in clinical trials.
U-Net: deep learning for cell counting, detection, and morphometry
A user-friendly ImageJ plugin enables the application and training of U-Nets for deep-learning-based image segmentation, detection and classification tasks with minimal labeling requirements.