Catalogue Search | MBRL

by Anderson, Robert S. (Robert Stewart) in Biomathematics , Biometry , Biometry -- Computer simulation

2012,2011

The emergence of high-speed computing has facilitated the development of many exciting statistical and mathematical methods in the last 25 years, broadening the landscape of available tools in statistical investigations of complex data. Biostatistics: A Computing Approachfocuses on visualization and computational approaches associated with both modern and classical techniques. Furthermore, it promotes computing as a tool for performing both analyses and simulations that can facilitate such understanding. As a practical matter, programs in R and SAS are presented throughout the text. In addition to these programs, appendices describing the basic use of SAS and R are provided. Teaching by example, this book emphasizes the importance of simulation and numerical exploration in a modern-day statistical investigation. A few statistical methods that can be implemented with simple calculations are also worked into the text to build insight about how the methods really work. Suitable for students who have an interest in the application of statistical methods but do not necessarily intend to become statisticians, this book has been developed from Introduction to Biostatistics II, which the author taught for more than a decade at the University of Pittsburgh.

eBook

Share this book

Add to My Shelf

Biostatistics for Medical and Biomedical Practitioners

by Hoffman, Frcp in Biometry , Computer simulation , Methodology

2015

eBook

Share this book

Add to My Shelf

Estimation of the Optimal Surrogate Based on a Randomized Trial

by van der Laan, Mark J. , Price, Brenda L. , Gilbert, Peter B. in Asymptotic linearity , Biomarkers , BIOMETRIC METHODOLOGY

2018

A common scientific problem is to determine a surrogate outcome for a long-term outcome so that future randomized studies can restrict themselves to only collecting the surrogate outcome. We consider the setting that we observe independent and identically distributed observations of a random variable consisting of baseline covariates, a treatment, a vector of candidate surrogate outcomes at an intermediate time point, and the final outcome of interest at a final time point. We assume the treatment is randomized, conditional on the baseline covariates. The goal is to use these data to learn a most-promising surrogate for use in future trials for inference about a mean contrast treatment effect on the final outcome. We define an optimal surrogate for the current study as the function of the data generating distribution collected by the intermediate time point that satisfies the Prentice definition of a valid surrogate endpoint and that optimally predicts the final outcome: this optimal surrogate is an unknown parameter. We show that this optimal surrogate is a conditional mean and present super-learner and targeted super-learner based estimators, whose predicted outcomes are used as the surrogate in applications. We demonstrate a number of desirable properties of this optimal surrogate and its estimators, and study the methodology in simulations and an application to dengue vaccine efficacy trials.

Journal Article

Share this book

Add to My Shelf

Causal mediation analysis with multiple mediators

by Daniel, R. M. , De Stavola, B. L. , Cousens, S. N. in alcohol drinking , Algorithms , BIOMETRIC METHODOLOGY

2015

In diverse fields of empirical research—including many in the biological sciences—attempts are made to decompose the effect of an exposure on an outcome into its effects via a number of different pathways. For example, we may wish to separate the effect of heavy alcohol consumption on systolic blood pressure (SBP) into effects via body mass index (BMI), via gamma-glutamyl transpeptidase (GGT), and via other pathways. Much progress has been made, mainly due to contributions from the field of causal inference, in understanding the precise nature of statistical estimands that capture such intuitive effects, the assumptions under which they can be identified, and statistical methods for doing so. These contributions have focused almost entirely on settings with a single mediator, or a set of mediators considered en bloc; in many applications, however, researchers attempt a much more ambitious decomposition into numerous path-specific effects through many mediators. In this article, we give counterfactual definitions of such path-specific estimands in settings with multiple mediators, when earlier mediators may affect later ones, showing that there are many ways in which decomposition can be done. We discuss the strong assumptions under which the effects are identified, suggesting a sensitivity analysis approach when a particular subset of the assumptions cannot be justified. These ideas are illustrated using data on alcohol consumption, SBP, BMI, and GGT from the Izhevsk Family Study. We aim to bridge the gap from \"single mediator theory\" to \"multiple mediator practice,\" highlighting the ambitious nature of this endeavor and giving practical suggestions on how to proceed.

Journal Article

Share this book

Add to My Shelf

Quantifying Publication Bias in Meta-Analysis

by Chu, Haitao , Lin, Lifeng in Animals , asymmetry , Bias

2018

Publication bias is a serious problem in systematic reviews and meta-analyses, which can affect the validity and generalization of conclusions. Currently, approaches to dealing with publication bias can be distinguished into two classes: selection models and funnel-plot-based methods. Selection models use weight functions to adjust the overall effect size estimate and are usually employed as sensitivity analyses to assess the potential impact of publication bias. Punnel-plot-based methods include visual examination of a funnel plot, regression and rank tests, and the nonparametric trim and fill method. Although these approaches have been widely used in applications, measures for quantifying publication bias are seldom studied in the literature. Such measures can be used as a characteristic of a meta-analysis; also, they permit comparisons of publication biases between different meta-analyses. Egger's regression intercept may be considered as a candidate measure, but it lacks an intuitive interpretation. This article introduces a new measure, the skewness of the standardized deviates, to quantify publication bias. This measure describes the asymmetry of the collected studies' distribution. In addition, a new test for publication bias is derived based on the skewness. Large sample properties of the new measure are studied, and its performance is illustrated using simulations and three case studies.

Journal Article

Share this book

Add to My Shelf

Jaccard/Tanimoto similarity test and estimation methods for biological presence-absence data

by Startek, Michał , Miasojedow, BłaŻej , Gambin, Anna in Algorithms , Animals , Aquatic habitats

2019

Background A survey of presences and absences of specific species across multiple biogeographic units (or bioregions) are used in a broad area of biological studies from ecology to microbiology. Using binary presence-absence data, we evaluate species co-occurrences that help elucidate relationships among organisms and environments. To summarize similarity between occurrences of species, we routinely use the Jaccard/Tanimoto coefficient, which is the ratio of their intersection to their union. It is natural, then, to identify statistically significant Jaccard/Tanimoto coefficients, which suggest non-random co-occurrences of species. However, statistical hypothesis testing using this similarity coefficient has been seldom used or studied. Results We introduce a hypothesis test for similarity for biological presence-absence data, using the Jaccard/Tanimoto coefficient. Several key improvements are presented including unbiased estimation of expectation and centered Jaccard/Tanimoto coefficients, that account for occurrence probabilities. The exact and asymptotic solutions are derived. To overcome a computational burden due to high-dimensionality, we propose the bootstrap and measurement concentration algorithms to efficiently estimate statistical significance of binary similarity. Comprehensive simulation studies demonstrate that our proposed methods produce accurate p -values and false discovery rates. The proposed estimation methods are orders of magnitude faster than the exact solution, particularly with an increasing dimensionality. We showcase their applications in evaluating co-occurrences of bird species in 28 islands of Vanuatu and fish species in 3347 freshwater habitats in France. The proposed methods are implemented in an open source R package called jaccard ( https://cran.r-project.org/package=jaccard ). Conclusion We introduce a suite of statistical methods for the Jaccard/Tanimoto similarity coefficient for binary data, that enable straightforward incorporation of probabilistic measures in analysis for species co-occurrences. Due to their generality, the proposed methods and implementations are applicable to a wide range of binary data arising from genomics, biochemistry, and other areas of science.

Journal Article

Share this book

Add to My Shelf

improved nonparametric lower bound of species richness via a modified good–turing frequency formula

by Chao, Anne , Wang, Yi-Ting , Chiu, Chun-Huo in Abundance data , Animals , Biodiversity

2014

It is difficult to accurately estimate species richness if there are many almost undetectable species in a hyper‐diverse community. Practically, an accurate lower bound for species richness is preferable to an inaccurate point estimator. The traditional nonparametric lower bound developed by Chao (1984, Scandinavian Journal of Statistics 11, 265–270) for individual‐based abundance data uses only the information on the rarest species (the numbers of singletons and doubletons) to estimate the number of undetected species in samples. Applying a modified Good–Turing frequency formula, we derive an approximate formula for the first‐order bias of this traditional lower bound. The approximate bias is estimated by using additional information (namely, the numbers of tripletons and quadrupletons). This approximate bias can be corrected, and an improved lower bound is thus obtained. The proposed lower bound is nonparametric in the sense that it is universally valid for any species abundance distribution. A similar type of improved lower bound can be derived for incidence data. We test our proposed lower bounds on simulated data sets generated from various species abundance models. Simulation results show that the proposed lower bounds always reduce bias over the traditional lower bounds and improve accuracy (as measured by mean squared error) when the heterogeneity of species abundances is relatively high. We also apply the proposed new lower bounds to real data for illustration and for comparisons with previously developed estimators.

Journal Article

Share this book

Add to My Shelf

On the importance of statistics in molecular simulations for thermodynamics, kinetics and simulation box size

by de Groot, Bert L , Gapsys, Vytautas in Biometry , Biostatistics - methods , Computer Simulation

2020

Computational simulations, akin to wetlab experimentation, are subject to statistical fluctuations. Assessing the magnitude of these fluctuations, that is, assigning uncertainties to the computed results, is of critical importance to drawing statistically reliable conclusions. Here, we use a simulation box size as an independent variable, to demonstrate how crucial it is to gather sufficient amounts of data before drawing any conclusions about the potential thermodynamic and kinetic effects. In various systems, ranging from solvation free energies to protein conformational transition rates, we showcase how the proposed simulation box size effect disappears with increased sampling. This indicates that, if at all, the simulation box size only minimally affects both the thermodynamics and kinetics of the type of biomolecular systems presented in this work.

Journal Article

Share this book

Add to My Shelf

New Criterion for Confounder Selection

by Shpitser, Ilya , VanderWeele, Tyler J. in Algorithms , BIOMETRIC METHODOLOGY , Biometrics

2011

We propose a new criterion for confounder selection when the underlying causal structure is unknown and only limited knowledge is available. We assume all covariates being considered are pretreatment variables and that for each covariate it is known (i) whether the covariate is a cause of treatment, and (ii) whether the covariate is a cause of the outcome. The causal relationships the covariates have with one another is assumed unknown. We propose that control be made for any covariate that is either a cause of treatment or of the outcome or both. We show that irrespective of the actual underlying causal structure, if any subset of the observed covariates suffices to control for confounding then the set of covariates chosen by our criterion will also suffice. We show that other, commonly used, criteria for confounding control do not have this property. We use formal theory concerning causal diagrams to prove our result but the application of the result does not rely on familiarity with causal diagrams. An investigator simply need ask, “Is the covariate a cause of the treatment?” and “Is the covariate a cause of the outcome?” If the answer to either question is “yes” then the covariate is included for confounder control. We discuss some additional covariate selection results that preserve unconfoundedness and that may be of interest when used with our criterion.

Journal Article

Share this book

Add to My Shelf

The evolutionary origins of modularity

by Mouret, Jean-Baptiste , Clune, Jeff , Lipson, Hod in Biological Evolution , Biometry , Computer Science

2013

A central biological question is how natural organisms are so evolvable (capable of quickly adapting to new environments). A key driver of evolvability is the widespread modularity of biological networks—their organization as functional, sparsely connected subunits—but there is no consensus regarding why modularity itself evolved. Although most hypotheses assume indirect selection for evolvability, here we demonstrate that the ubiquitous, direct selection pressure to reduce the cost of connections between network nodes causes the emergence of modular networks. Computational evolution experiments with selection pressures to maximize network performance and minimize connection costs yield networks that are significantly more modular and more evolvable than control experiments that only select for performance. These results will catalyse research in numerous disciplines, such as neuroscience and genetics, and enhance our ability to harness evolution for engineering purposes.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter