Catalogue Search | MBRL

GENERALIZED FIDUCIAL INFERENCE FOR NORMAL LINEAR MIXED MODELS

by Hannig, Jan , Cisewski, Jessi in 62F10 , 62F25 , 62J99

2012

While linear mixed modeling methods are foundational concepts introduced in any statistical education, adequate general methods for interval estimation involving models with more than a few variance components are lacking, especially in the unbalanced setting. Generalized fiducial inference provides a possible framework that accommodates this absence of methodology. Under the fabric of generalized fiducial inference along with sequential Monte Carlo methods, we present an approach for interval estimation for both balanced and unbalanced Gaussian linear mixed models. We compare the proposed method to classical and Bayesian results in the literature in a simulation study of two-fold nested models and two-factor crossed designs with an interaction term. The proposed method is found to be competitive or better when evaluated based on frequentist criteria of empirical coverage and average length of confidence intervals for small sample sizes. A MATLAB implementation of the proposed algorithm is available from the authors.

Journal Article

Share this book

Add to My Shelf

Completing the Results of the 2013 Boston Marathon

by Cisewski, Jessi , Dominici, Francesca , Paulson, Charles in Algorithms , Analysis , Biology and Life Sciences

2014

The 2013 Boston marathon was disrupted by two bombs placed near the finish line. The bombs resulted in three deaths and several hundred injuries. Of lesser concern, in the immediate aftermath, was the fact that nearly 6,000 runners failed to finish the race. We were approached by the marathon's organizers, the Boston Athletic Association (BAA), and asked to recommend a procedure for projecting finish times for the runners who could not complete the race. With assistance from the BAA, we created a dataset consisting of all the runners in the 2013 race who reached the halfway point but failed to finish, as well as all runners from the 2010 and 2011 Boston marathons. The data consist of split times from each of the 5 km sections of the course, as well as the final 2.2 km (from 40 km to the finish). The statistical objective is to predict the missing split times for the runners who failed to finish in 2013. We set this problem in the context of the matrix completion problem, examples of which include imputing missing data in DNA microarray experiments, and the Netflix prize problem. We propose five prediction methods and create a validation dataset to measure their performance by mean squared error and other measures. The best method used local regression based on a K-nearest-neighbors algorithm (KNN method), though several other methods produced results of similar quality. We show how the results were used to create projected times for the 2013 runners and discuss potential for future application of the same methodology. We present the whole project as an example of reproducible research, in that we are able to make the full data and all the algorithms we have used publicly available, which may facilitate future research extending the methods or proposing completely different approaches.

Journal Article

Share this book

Add to My Shelf

Standards for Modest Bayesian Credences

by Schervish, Mark J. , Cisewski, Jessi , Stern, Rafael in Attitudes , Bayesian analysis , Conditioning

2018

Gordon Belot argues that Bayesian theory is epistemologically immodest. In response, we show that the topological conditions that underpin his criticisms of asymptotic Bayesian conditioning are self-defeating. They require extreme a priori credences regarding, for example, the limiting behavior of observed relative frequencies. We offer a different explication of Bayesian modesty using a goal of consensus: rival scientific opinions should be responsive to new facts as a way to resolve their disputes. Also we address Adam Elga’s rebuttal to Belot’s analysis, which focuses attention on the role that the assumption of countable additivity plays in Belot’s criticisms.

Journal Article

Share this book

Add to My Shelf

Sleeping Beauty’s Credences

by Schervish, Mark J. , Cisewski, Jessi , Stern, Rafael in Bayesian analysis , Beauty , Conditioning

2016

The Sleeping Beauty problem has spawned a debate between “thirders” and “halfers” who draw conflicting conclusions about Sleeping Beauty's credence that a coin lands heads. Our analysis is based on a probability model for what Sleeping Beauty knows at each time during the experiment. We show that conflicting conclusions result from different modeling assumptions that each group makes. Our analysis uses a standard “Bayesian” account of rational belief with conditioning. No special handling is used for self-locating beliefs or centered propositions. We also explore what fair prices Sleeping Beauty computes for gambles that she might be offered during the experiment.

Journal Article

Share this book

Add to My Shelf

A HERMITE–GAUSSIAN BASED EXOPLANET RADIAL VELOCITY ESTIMATION METHOD

by Cisewski-Kehe, Jessi , Zhao, Lily , Fischer, Debra

2021

As the first successful technique used to detect exoplanets orbiting distant stars, the radial velocity method aims to detect a periodic Doppler shift in a stellar spectrum due to the star's motion along the line sight. We introduce a new, mathematically rigorous approach to detect such a signal that accounts for the smooth functional relationship of neighboring wavelengths in the spectrum, minimizes the role of wavelength interpolation, accounts for heteroskedastic noise and easily allows for accurate calculation of the estimated radial velocity standard error. Using Hermite–Gaussian functions, we show that the problem of detecting a Doppler shift in the spectrum can be reduced to linear regression in many settings. A simulation study demonstrates that the proposed method is able to accurately estimate an individual spectrum's radial velocity with precision below 0.3 m s−1, corresponding to a Doppler shift much smaller than the size of a spectral pixel. Furthermore, the new method outperforms the traditional cross-correlation function approach for estimating the radial velocity by reducing the root mean squared error up to 15 cm s−1. The proposed method is also demonstrated on a new set of observations from the EXtreme PREcision Spectrometer (EXPRES) for the host star 51 Pegasi, and successfully recovers estimates of the planetary companion's parameters that agree well with previous studies. The method is implemented in the R package rvmethod, and supplemental Python code is also available.

Journal Article

Share this book

Add to My Shelf

High-energy Neutrino Source Cross-correlations with Nearest-neighbor Distributions

by Fang, Ke , Cisewski-Kehe, Jessi , Banerjee, Arka in Cross correlation , Distribution functions , Galaxies

2025

The astrophysical origins of the majority of the IceCube neutrinos remain unknown. Effectively characterizing the spatial distribution of the neutrino samples and associating the events with astrophysical source catalogs can be challenging given the large atmospheric neutrino background and underlying non-Gaussian spatial features in the neutrino and source samples. In this paper, we investigate a framework for identifying and statistically evaluating the cross-correlations between IceCube data and an astrophysical source catalog based on the \\(k\\)-nearest-neighbor cumulative distribution functions (\\(k\\)NN-CDFs). We propose a maximum likelihood estimation procedure for inferring the true proportions of astrophysical neutrinos in the point-source data. We conduct a statistical power analysis of an associated likelihood ratio test with estimations of its sensitivity and discovery potential with synthetic neutrino data samples and a WISE-2MASS galaxy sample. We apply the method to IceCube's public ten-year point-source data and find no statistically significant evidence for spatial cross-correlations with the selected galaxy sample. We discuss possible extensions to the current method and explore the method's potential to identify the cross-correlation signals in data sets with different sample sizes.

Paper

Share this book

Add to My Shelf

MaxTDA: Robust Statistical Inference for Maximal Persistence in Topological Data Analysis

by Cisewski-Kehe, Jessi , Dakurah, Sixtus in Astronomy , Data analysis , Datasets

2025

Persistent homology is an area within topological data analysis (TDA) that can uncover different dimensional holes (connected components, loops, voids, etc.) in data. The holes are characterized, in part, by how long they persist across different scales. Noisy data can result in many additional holes that are not true topological signal. Various robust TDA techniques have been proposed to reduce the number of noisy holes, however, these robust methods have a tendency to also reduce the topological signal. This work introduces Maximal TDA (MaxTDA), a statistical framework addressing a limitation in TDA wherein robust inference techniques systematically underestimate the persistence of significant homological features. MaxTDA combines kernel density estimation with level-set thresholding via rejection sampling to generate consistent estimators for the maximal persistence features that minimizes bias while maintaining robustness to noise and outliers. We establish the consistency of the sampling procedure and the stability of the maximal persistence estimator. The framework also enables statistical inference on topological features through rejection bands, constructed from quantiles that bound the estimator's deviation probability. MaxTDA is particularly valuable in applications where precise quantification of statistically significant topological features is essential for revealing underlying structural properties in complex datasets. Numerical simulations across varied datasets, including an example from exoplanet astronomy, highlight the effectiveness of MaxTDA in recovering true topological signals.

Paper

Share this book

Add to My Shelf

A Subsequence Approach to Topological Data Analysis for Irregularly-Spaced Time Series

by Cisewski-Kehe, Jessi , Dakurah, Sixtus in Data analysis , Embedding , Homology

2024

A time-delay embedding (TDE), grounded in the framework of Takens's Theorem, provides a mechanism to represent and analyze the inherent dynamics of time-series data. Recently, topological data analysis (TDA) methods have been applied to study this time series representation mainly through the lens of persistent homology. Current literature on the fusion of TDE and TDA are adept at analyzing uniformly-spaced time series observations. This work introduces a novel {\\em subsequence} embedding method for irregularly-spaced time-series data. We show that this method preserves the original state space topology while reducing spurious homological features. Theoretical stability results and convergence properties of the proposed method in the presence of noise and varying levels of irregularity in the spacing of the time series are established. Numerical studies and an application to real data illustrates the performance of the proposed method.

Paper

Share this book

Add to My Shelf

A Divide-and-Conquer Approach to Persistent Homology

by Li, Chenghui , Cisewski-Kehe, Jessi in Algorithms , Clustering , Complexity

2024

Persistent homology is a tool of topological data analysis that has been used in a variety of settings to characterize different dimensional holes in data. However, persistent homology computations can be memory intensive with a computational complexity that does not scale well as the data size becomes large. In this work, we propose a divide-and-conquer (DaC) method to mitigate these issues. The proposed algorithm efficiently finds small, medium, and large-scale holes by partitioning data into sub-regions and uses a Vietoris-Rips filtration. Furthermore, we provide theoretical results that quantify the bottleneck distance between DaC and the true persistence diagram and the recovery probability of holes in the data. We empirically verify that the rate coincides with our theoretical rate, and find that the memory and computational complexity of DaC outperforms an alternative method that relies on a clustering preprocessing step to reduce the memory and computational complexity of the persistent homology computations. Finally, we test our algorithm using spatial data of the locations of lakes in Wisconsin, where the classical persistent homology is computationally infeasible.

Paper

Share this book

Add to My Shelf

Generalized fiducial inference for normal linear mixed models

by Hannig, Jan , Cisewski, Jessi in Algorithms , Bayesian analysis , Computer simulation

2012

While linear mixed modeling methods are foundational concepts introduced in any statistical education, adequate general methods for interval estimation involving models with more than a few variance components are lacking, especially in the unbalanced setting. Generalized fiducial inference provides a possible framework that accommodates this absence of methodology. Under the fabric of generalized fiducial inference along with sequential Monte Carlo methods, we present an approach for interval estimation for both balanced and unbalanced Gaussian linear mixed models. We compare the proposed method to classical and Bayesian results in the literature in a simulation study of two-fold nested models and two-factor crossed designs with an interaction term. The proposed method is found to be competitive or better when evaluated based on frequentist criteria of empirical coverage and average length of confidence intervals for small sample sizes. A MATLAB implementation of the proposed algorithm is available from the authors.

Paper

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter