Catalogue Search | MBRL

Computational methods in biomedical research

by Khattree, Ravindra , Naik, Dayanand N in Medicine Research Data processing. , Biology Research Data processing. , Medicine Research Statistical methods.

Book

Share this book

Add to My Shelf

Statistical advances in the biomedical sciences

by Datta, Sujay , Biswas, Atanu , Fine, Jason P in Bioinformatics , Biology , Biology -- Research -- Statistical methods

2007,2008

The Most Comprehensive and Cutting-Edge Guide to Statistical Applications in Biomedical Research With the increasing use of biotechnology in medical research and the sophisticated advances in computing, it has become essential for practitioners in the biomedical sciences to be fully educated on the role statistics plays in ensuring the accurate analysis of research findings. Statistical Advances in the Biomedical Sciences explores the growing value of statistical knowledge in the management and comprehension of medical research and, more specifically, provides an accessible introduction to the contemporary methodologies used to understand complex problems in the four major areas of modern-day biomedical science: clinical trials, epidemiology, survival analysis, and bioinformatics. Composed of contributions from eminent researchers in the field, this volume discusses the application of statistical techniques to various aspects of modern medical research and illustrates how these methods ultimately prove to be an indispensable part of proper data collection and analysis. A structural uniformity is maintained across all chapters, each beginning with an introduction that discusses general concepts and the biomedical problem under focus and is followed by specific details on the associated methods, algorithms, and applications. In addition, each chapter provides a summary of the main ideas and offers a concluding remarks section that presents novel ideas, approaches, and challenges for future research. Complete with detailed references and insight on the future directions of biomedical research, Statistical Advances in the Biomedical Sciences provides vital statistical guidance to practitioners in the biomedical sciences while also introducing statisticians to new, multidisciplinary frontiers of application. This text is an excellent reference for graduate- and PhD-level courses in various areas of biostatistics and the medical sciences and also serves as a valuable tool for medical researchers, statisticians, public health professionals, and biostatisticians.

eBook

Share this book

Add to My Shelf

mixOmics: An R package for ‘omics feature selection and multiple data integration

by Gautier, Benoît , Lê Cao, Kim-Anh , Rohart, Florian in Bioinformatics , Biological analysis , Biology

2017

The advent of high throughput technologies has led to a wealth of publicly available 'omics data coming from different sources, such as transcriptomics, proteomics, metabolomics. Combining such large-scale biological data sets can lead to the discovery of important biological insights, provided that relevant information can be extracted in a holistic manner. Current statistical approaches have been focusing on identifying small subsets of molecules (a 'molecular signature') to explain or predict biological conditions, but mainly for a single type of 'omics. In addition, commonly used methods are univariate and consider each biological feature independently. We introduce mixOmics, an R package dedicated to the multivariate analysis of biological data sets with a specific focus on data exploration, dimension reduction and visualisation. By adopting a systems biology approach, the toolkit provides a wide range of methods that statistically integrate several data sets at once to probe relationships between heterogeneous 'omics data sets. Our recent methods extend Projection to Latent Structure (PLS) models for discriminant analysis, for data integration across multiple 'omics data or across independent studies, and for the identification of molecular signatures. We illustrate our latest mixOmics integrative frameworks for the multivariate analyses of 'omics data available from the package.

Journal Article

Share this book

Add to My Shelf

Jaccard/Tanimoto similarity test and estimation methods for biological presence-absence data

by Startek, Michał , Miasojedow, BłaŻej , Gambin, Anna in Algorithms , Animals , Aquatic habitats

2019

Background A survey of presences and absences of specific species across multiple biogeographic units (or bioregions) are used in a broad area of biological studies from ecology to microbiology. Using binary presence-absence data, we evaluate species co-occurrences that help elucidate relationships among organisms and environments. To summarize similarity between occurrences of species, we routinely use the Jaccard/Tanimoto coefficient, which is the ratio of their intersection to their union. It is natural, then, to identify statistically significant Jaccard/Tanimoto coefficients, which suggest non-random co-occurrences of species. However, statistical hypothesis testing using this similarity coefficient has been seldom used or studied. Results We introduce a hypothesis test for similarity for biological presence-absence data, using the Jaccard/Tanimoto coefficient. Several key improvements are presented including unbiased estimation of expectation and centered Jaccard/Tanimoto coefficients, that account for occurrence probabilities. The exact and asymptotic solutions are derived. To overcome a computational burden due to high-dimensionality, we propose the bootstrap and measurement concentration algorithms to efficiently estimate statistical significance of binary similarity. Comprehensive simulation studies demonstrate that our proposed methods produce accurate p -values and false discovery rates. The proposed estimation methods are orders of magnitude faster than the exact solution, particularly with an increasing dimensionality. We showcase their applications in evaluating co-occurrences of bird species in 28 islands of Vanuatu and fish species in 3347 freshwater habitats in France. The proposed methods are implemented in an open source R package called jaccard ( https://cran.r-project.org/package=jaccard ). Conclusion We introduce a suite of statistical methods for the Jaccard/Tanimoto similarity coefficient for binary data, that enable straightforward incorporation of probabilistic measures in analysis for species co-occurrences. Due to their generality, the proposed methods and implementations are applicable to a wide range of binary data arising from genomics, biochemistry, and other areas of science.

Journal Article

Share this book

Add to My Shelf

Ten quick tips for effective dimensionality reduction

by Nguyen, Lan Huong , Holmes, Susan in Artificial intelligence , Bioinformatics , Biological research

2019

Both a means of denoising and simplification, it can be beneficial for the majority of modern biological datasets, in which it’s not uncommon to have hundreds or even millions of simultaneous measurements collected for a single sample. Because of “the curse of dimensionality,” many statistical methods lack power when applied to high-dimensional data. Formally, the Marchenko–Pastur distribution asymptotically models the distribution of the singular values of large random matrices. [...]for datasets large in both the number of observations and features, you use a rule of retaining only eigenvalues outside the support of the fitted Marchenko–Pastur distribution; however, remember that this applies only when your data have at least thousands of samples and thousands of features. [...]the height-to-width ratio of a PCA plot should be consistent with the ratio between the corresponding eigenvalues. Because eigenvalues reflect the variance in coordinates of the associated PCs, you only need to ensure that in the plots, one \"unit\" in direction of one PC has the same length as one \"unit\" in direction of another PC. Because batch effects can confound the signal of interest, it is a good practice to check for their presence and, if found, to remove them before proceeding with further downstream analysis.

Journal Article

Share this book

Add to My Shelf

Statistical power for cluster analysis

by Dalmaijer, Edwin S. , Nord, Camilla L. , Astle, Duncan E. in Algorithms , Bioinformatics , Biomedical and Life Sciences

2022

Background Cluster algorithms are gaining in popularity in biomedical research due to their compelling ability to identify discrete subgroups in data, and their increasing accessibility in mainstream software. While guidelines exist for algorithm selection and outcome evaluation, there are no firmly established ways of computing a priori statistical power for cluster analysis. Here, we estimated power and classification accuracy for common analysis pipelines through simulation. We systematically varied subgroup size, number, separation (effect size), and covariance structure. We then subjected generated datasets to dimensionality reduction approaches (none, multi-dimensional scaling, or uniform manifold approximation and projection) and cluster algorithms (k-means, agglomerative hierarchical clustering with Ward or average linkage and Euclidean or cosine distance, HDBSCAN). Finally, we directly compared the statistical power of discrete (k-means), “fuzzy” (c-means), and finite mixture modelling approaches (which include latent class analysis and latent profile analysis). Results We found that clustering outcomes were driven by large effect sizes or the accumulation of many smaller effects across features, and were mostly unaffected by differences in covariance structure. Sufficient statistical power was achieved with relatively small samples (N = 20 per subgroup), provided cluster separation is large (Δ = 4). Finally, we demonstrated that fuzzy clustering can provide a more parsimonious and powerful alternative for identifying separable multivariate normal distributions, particularly those with slightly lower centroid separation (Δ = 3). Conclusions Traditional intuitions about statistical power only partially apply to cluster analysis: increasing the number of participants above a sufficient sample size did not improve power, but effect size was crucial. Notably, for the popular dimensionality reduction and clustering algorithms tested here, power was only satisfactory for relatively large effect sizes (clear separation between subgroups). Fuzzy clustering provided higher power in multivariate normal distributions. Overall, we recommend that researchers (1) only apply cluster analysis when large subgroup separation is expected, (2) aim for sample sizes of N = 20 to N = 30 per expected subgroup, (3) use multi-dimensional scaling to improve cluster separation, and (4) use fuzzy clustering or mixture modelling approaches that are more powerful and more parsimonious with partially overlapping multivariate normal distributions.

Journal Article

Share this book

Add to My Shelf

The triumphs and limitations of computational methods for scRNA-seq

by Kharchenko Peter V in Algorithms , Approximation , Biology

2021

The rapid progress of protocols for sequencing single-cell transcriptomes over the past decade has been accompanied by equally impressive advances in the computational methods for analysis of such data. As capacity and accuracy of the experimental techniques grew, the emerging algorithm developments revealed increasingly complex facets of the underlying biology, from cell type composition to gene regulation to developmental dynamics. At the same time, rapid growth has forced continuous reevaluation of the underlying statistical models, experimental aims, and sheer volumes of data processing that are handled by these computational tools. Here, I review key computational steps of single-cell RNA sequencing (scRNA-seq) analysis, examine assumptions made by different approaches, and highlight successes, remaining ambiguities, and limitations that are important to keep in mind as scRNA-seq becomes a mainstream technique for studying biology.This review provides an overview of recent computational developments in scRNA-seq analysis and highlights packages and tools applied in executing these analyses.

Journal Article

Share this book

Add to My Shelf

SARTools: A DESeq2- and EdgeR-Based R Pipeline for Comprehensive Differential Analysis of RNA-Seq Data

by Brillet-Guéguen, Loraine , Dillies, Marie-Agnès , Varet, Hugo in Binomial distribution , Bioinformatics , Biology

2016

Several R packages exist for the detection of differentially expressed genes from RNA-Seq data. The analysis process includes three main steps, namely normalization, dispersion estimation and test for differential expression. Quality control steps along this process are recommended but not mandatory, and failing to check the characteristics of the dataset may lead to spurious results. In addition, normalization methods and statistical models are not exchangeable across the packages without adequate transformations the users are often not aware of. Thus, dedicated analysis pipelines are needed to include systematic quality control steps and prevent errors from misusing the proposed methods. SARTools is an R pipeline for differential analysis of RNA-Seq count data. It can handle designs involving two or more conditions of a single biological factor with or without a blocking factor (such as a batch effect or a sample pairing). It is based on DESeq2 and edgeR and is composed of an R package and two R script templates (for DESeq2 and edgeR respectively). Tuning a small number of parameters and executing one of the R scripts, users have access to the full results of the analysis, including lists of differentially expressed genes and a HTML report that (i) displays diagnostic plots for quality control and model hypotheses checking and (ii) keeps track of the whole analysis process, parameter values and versions of the R packages used. SARTools provides systematic quality controls of the dataset as well as diagnostic plots that help to tune the model parameters. It gives access to the main parameters of DESeq2 and edgeR and prevents untrained users from misusing some functionalities of both packages. By keeping track of all the parameters of the analysis process it fits the requirements of reproducible research.

Journal Article

Share this book

Add to My Shelf

Quantitative methods for health research

by Daniel Pope , Debbi Stanistreet , Nigel Bruce in Epidemiology , Health , Health surveys

2018,2017

A practical introduction to epidemiology, biostatistics, and research methodology for the whole health care community This comprehensive text, which has been extensively revised with new material and additional topics, utilizes a practical slant to introduce health professionals and students to epidemiology, biostatistics, and research methodology. It draws examples from a wide range of topics, covering all of the main contemporary health research methods, including survival analysis, Cox regression, and systematic reviews and meta-analysis—the explanation of which go beyond introductory concepts. This second edition of Quantitative Methods for Health Research: A Practical Interactive Guide to Epidemiology and Statistics also helps develop critical skills that will prepare students to move on to more advanced and specialized methods. A clear distinction is made between knowledge and concepts that all students should ensure they understand, and those that can be pursued further by those who wish to do so. Self-assessment exercises throughout the text help students explore and reflect on their understanding. A program of practical exercises in SPSS (using a prepared data set) helps to consolidate the theory and develop skills and confidence in data handling, analysis, and interpretation. Highlights of the book include: * Combining epidemiology and bio-statistics to demonstrate the relevance and strength of statistical methods * Emphasis on the interpretation of statistics using examples from a variety of public health and health care situations to stress relevance and application * Use of concepts related to examples of published research to show the application of methods and balance between ideals and the realities of research in practice * Integration of practical data analysis exercises to develop skills and confidence * Supplementation by a student companion website which provides guidance on data handling in SPSS and study data sets as referred to in the text Quantitative Methods for Health Research, Second Edition is a practical learning resource for students, practitioners and researchers in public health, health care and related disciplines, providing both a course book and a useful introductory reference.

eBook

Share this book

Add to My Shelf

PPanGGOLiN: Depicting microbial diversity via a partitioned pangenome graph

by Burlot, Laura , Bazin, Adelme , Planel, Rémi in Algorithms , Bacteria - classification , Bacteria - genetics

2020

The use of comparative genomics for functional, evolutionary, and epidemiological studies requires methods to classify gene families in terms of occurrence in a given species. These methods usually lack multivariate statistical models to infer the partitions and the optimal number of classes and don't account for genome organization. We introduce a graph structure to model pangenomes in which nodes represent gene families and edges represent genomic neighborhood. Our method, named PPanGGOLiN, partitions nodes using an Expectation-Maximization algorithm based on multivariate Bernoulli Mixture Model coupled with a Markov Random Field. This approach takes into account the topology of the graph and the presence/absence of genes in pangenomes to classify gene families into persistent, cloud, and one or several shell partitions. By analyzing the partitioned pangenome graphs of isolate genomes from 439 species and metagenome-assembled genomes from 78 species, we demonstrate that our method is effective in estimating the persistent genome. Interestingly, it shows that the shell genome is a key element to understand genome dynamics, presumably because it reflects how genes present at intermediate frequencies drive adaptation of species, and its proportion in genomes is independent of genome size. The graph-based approach proposed by PPanGGOLiN is useful to depict the overall genomic diversity of thousands of strains in a compact structure and provides an effective basis for very large scale comparative genomics. The software is freely available at https://github.com/labgem/PPanGGOLiN.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter