Catalogue Search | MBRL

Points of Significance: Principal component analysis

by Altman, Naomi , Krzywinski, Martin , Lever, Jake in Analysis of variance , Data analysis , Methods

2017

Principal component analysis (PCA) simplifies the complexity in high-dimensional data while retaining trends and patterns. It does this by transforming the data into fewer dimensions, which act as summaries of features.

Journal Article

Share this book

Add to My Shelf

Using clusterProfiler to characterize multiomics data

by Xie, Liwei , Xu, Shuangbin , Tang, Wenli in 631/114/2164 , 631/114/794 , 631/1647/320

2024

With the advent of multiomics, software capable of multidimensional enrichment analysis has become increasingly crucial for uncovering gene set variations in biological processes and disease pathways. This is essential for elucidating disease mechanisms and identifying potential therapeutic targets. clusterProfiler stands out for its comprehensive utilization of databases and advanced visualization features. Importantly, clusterProfiler supports various biological knowledge, including Gene Ontology and Kyoto Encyclopedia of Genes and Genomes, through performing over-representation and gene set enrichment analyses. A key feature is that clusterProfiler allows users to choose from various graphical outputs to visualize results, enhancing interpretability. This protocol describes innovative ways in which clusterProfiler has been used for integrating metabolomics and metagenomics analyses, identifying and characterizing transcription factors under stress conditions, and annotating cells in single-cell studies. In all cases, the computational steps can be completed within ~2 min. clusterProfiler is released through the Bioconductor project and can be accessed via https://bioconductor.org/packages/clusterProfiler/ . Key points clusterProfiler is a software package for characterizing and interpreting omics data. Functional enrichment can be achieved using either over-representation or gene set enrichment analyses; it supports the use of a variety of databases, e.g., Gene Ontology and Kyoto Encyclopedia of Genes and Genomes. Three procedures show specific R commands for example applications asking different research questions and having different graphical outputs. Advice is provided on how to modify the procedures for other applications. clusterProfiler is a tool for characterizing and visualizing omics data. The example procedures show integration of metabolomics and metagenomics analyses, characterization of transcription factors and annotation of cells in single-cell studies.

Journal Article

Share this book

Add to My Shelf

Large-scale foundation model on single-cell transcriptomics

by Zhang, Xuegong , Hao, Minsheng , Gong, Jing in 631/114/1305 , 631/114/2397 , 631/1647/794

2024

Large pretrained models have become foundation models leading to breakthroughs in natural language processing and related fields. Developing foundation models for deciphering the ‘languages’ of cells and facilitating biomedical research is promising yet challenging. Here we developed a large pretrained model scFoundation, also named ‘xTrimoscFoundation α ’, with 100 million parameters covering about 20,000 genes, pretrained on over 50 million human single-cell transcriptomic profiles. scFoundation is a large-scale model in terms of the size of trainable parameters, dimensionality of genes and volume of training data. Its asymmetric transformer-like architecture and pretraining task design empower effectively capturing complex context relations among genes in a variety of cell types and states. Experiments showed its merit as a foundation model that achieved state-of-the-art performances in a diverse array of single-cell analysis tasks such as gene expression enhancement, tissue drug response prediction, single-cell drug response classification, single-cell perturbation prediction, cell type annotation and gene module inference. scFoundation, with 100 million parameters covering about 20,000 genes, pretrained on over 50 million single-cell transcriptomics profiles, is a foundation model for diverse tasks of single-cell analysis.

Journal Article

Share this book

Add to My Shelf

Points of Significance: Statistics versus machine learning

by Bzdok, Danilo , Altman, Naomi , Krzywinski, Martin in Algorithms , Experiments , Gene expression

2018

Journal Article

Share this book

Add to My Shelf

Polypolish: Short-read polishing of long-read bacterial genome assemblies

by Wick, Ryan R. , Holt, Kathryn E. in Accuracy , Assemblies , Biology and Life Sciences

2022

Long-read-only bacterial genome assemblies usually contain residual errors, most commonly homopolymer-length errors. Short-read polishing tools can use short reads to fix these errors, but most rely on short-read alignment which is unreliable in repeat regions. Errors in such regions are therefore challenging to fix and often remain after short-read polishing. Here we introduce Polypolish, a new short-read polisher which uses all-per-read alignments to repair errors in repeat sequences that other polishers cannot. Polypolish performed well in benchmarking tests using both simulated and real reads, and it almost never introduced errors during polishing. The best results were achieved by using Polypolish in combination with other short-read polishers.

Journal Article

Share this book

Add to My Shelf

Method of the Year: spatially resolved transcriptomics

in Gene expression , Genomics , Hybridization

2021

Nature Methods has crowned spatially resolved transcriptomics Method of the Year 2020.

Journal Article

Share this book

Add to My Shelf

CheckM2: a rapid, scalable and accurate tool for assessing microbial genome quality using machine learning

by Parks, Donovan H. , Tyson, Gene W. , Woodcroft, Ben J. in 631/114/1305 , 631/114/2785 , 631/114/794

2023

Advances in sequencing technologies and bioinformatics tools have dramatically increased the recovery rate of microbial genomes from metagenomic data. Assessing the quality of metagenome-assembled genomes (MAGs) is a critical step before downstream analysis. Here, we present CheckM2, an improved method of predicting genome quality of MAGs using machine learning. Using synthetic and experimental data, we demonstrate that CheckM2 outperforms existing tools in both accuracy and computational speed. In addition, CheckM2’s database can be rapidly updated with new high-quality reference genomes, including taxa represented only by a single genome. We also show that CheckM2 accurately predicts genome quality for MAGs from novel lineages, even for those with reduced genome size (for example, Patescibacteria and the DPANN superphylum). CheckM2 provides accurate genome quality predictions across bacterial and archaeal lineages, giving increased confidence when inferring biological conclusions from MAGs. This work presents CheckM2, which is a machine learning-based tool to predict genome quality of isolate, single-cell and metagenome-assembled genomes.

Journal Article

Share this book

Add to My Shelf

Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm

by Concepcion, Gregory T , Cheng Haoyu , Li, Heng in Accuracy , Algorithms , Assemblies

2021

Haplotype-resolved de novo assembly is the ultimate solution to the study of sequence variations in a genome. However, existing algorithms either collapse heterozygous alleles into one consensus copy or fail to cleanly separate the haplotypes to produce high-quality phased assemblies. Here we describe hifiasm, a de novo assembler that takes advantage of long high-fidelity sequence reads to faithfully represent the haplotype information in a phased assembly graph. Unlike other graph-based assemblers that only aim to maintain the contiguity of one haplotype, hifiasm strives to preserve the contiguity of all haplotypes. This feature enables the development of a graph trio binning algorithm that greatly advances over standard trio binning. On three human and five nonhuman datasets, including California redwood with a ~30-Gb hexaploid genome, we show that hifiasm frequently delivers better assemblies than existing tools and consistently outperforms others on haplotype-resolved assembly.Hifiasm is a haplotype-resolved de novo genome assembler for long-read high-fidelity sequencing data based on phased assembly graphs.

Journal Article

Share this book

Add to My Shelf

OpenMM 7: Rapid development of high performance algorithms for molecular dynamics

by Chodera, John D. , Eastman, Peter , Pande, Vijay S. in Algorithms , Biology , Biology and Life Sciences

2017

OpenMM is a molecular dynamics simulation toolkit with a unique focus on extensibility. It allows users to easily add new features, including forces with novel functional forms, new integration algorithms, and new simulation protocols. Those features automatically work on all supported hardware types (including both CPUs and GPUs) and perform well on all of them. In many cases they require minimal coding, just a mathematical description of the desired function. They also require no modification to OpenMM itself and can be distributed independently of OpenMM. This makes it an ideal tool for researchers developing new simulation methods, and also allows those new methods to be immediately available to the larger community.

Journal Article

Share this book

Add to My Shelf

Multivariable association discovery in population-scale meta-omics studies

by Zhang, Yancong , Weingart, George , Ma, Siyuan in Analysis , Biology and Life Sciences , Computational Biology

2021

It is challenging to associate features such as human health outcomes, diet, environmental conditions, or other metadata to microbial community measurements, due in part to their quantitative properties. Microbiome multi-omics are typically noisy, sparse (zero-inflated), high-dimensional, extremely non-normal, and often in the form of count or compositional measurements. Here we introduce an optimized combination of novel and established methodology to assess multivariable association of microbial community features with complex metadata in population-scale observational studies. Our approach, MaAsLin 2 (Microbiome Multivariable Associations with Linear Models), uses generalized linear and mixed models to accommodate a wide variety of modern epidemiological studies, including cross-sectional and longitudinal designs, as well as a variety of data types (e.g., counts and relative abundances) with or without covariates and repeated measurements. To construct this method, we conducted a large-scale evaluation of a broad range of scenarios under which straightforward identification of meta-omics associations can be challenging. These simulation studies reveal that MaAsLin 2’s linear model preserves statistical power in the presence of repeated measures and multiple covariates, while accounting for the nuances of meta-omics features and controlling false discovery. We also applied MaAsLin 2 to a microbial multi-omics dataset from the Integrative Human Microbiome (HMP2) project which, in addition to reproducing established results, revealed a unique, integrated landscape of inflammatory bowel diseases (IBD) across multiple time points and omics profiles.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter