Catalogue Search | MBRL

Consumer Behavior in the Online Classroom

by Ferreira, Pedro , Smith, Michael D. , Chen, George H. in Consumer behavior , Consumers , Machine learning

2021

Video is one of the fastest growing online services offered to consumers. The rapid growth of online video consumption brings new opportunities for marketing executives and researchers to analyze consumer behavior. However, video also introduces new challenges. Specifically, analyzing unstructured video data presents formidable methodological challenges that limit the use of multimedia data to generate marketing insights. To address this challenge, the authors propose a novel video feature framework based on machine learning and computer vision techniques, which helps marketers predict and understand the consumption of online video from a content-based perspective. The authors apply this framework to two unique data sets: one provided by MasterClass, consisting of 771 online videos and more than 2.6 million viewing records from 225,580 consumers, and another from Crash Course, consisting of 1,127 videos focusing on more traditional education disciplines. The analyses show that the frame-work proposed in this article can be used to accurately predict both individual-level consumer behavior and aggregate video popularity in these two very different contexts. The authors discuss how their findings and methods can be used to advance management and marketing research with unstructured video data in other contexts such as video marketing and entertainment analytics.

Journal Article

Share this book

Add to My Shelf

Convergence Rates in Forward--Backward Splitting

by Chen, George H-G. , Rockafellar, R. T. in Algorithms , Mathematical programming , Numerical analysis

1997

Forward--backward splitting methods provide a range of approaches to solving large-scale optimization problems and variational inequalities in which structuresconducive to decomposition can be utilized. Apart from special cases where the forward step is absent and a version of the proximal point algorithm comes out, efforts at evaluating the convergence potential of such methods have so far relied on Lipschitz properties and strong monotonicity, or inverse strong monotonicity, of the mapping involved in the forward step, the perspective mainly being that of projection algorithms. Here, convergence is analyzed by a technique that allows properties of the mapping in the backward step to be brought in as well. For the first time in such a general setting, global and local contraction rates are derived; moreover, they are derived in a form which makes it possible to determine the optimal step size relative to certain constants associated with the given problem. Insights are thereby gained into the effects of shifting strong monotonicity between the forward and backward mappings when a splitting is selected.

Journal Article

Share this book

Add to My Shelf

Survival Kernets: Scalable and Interpretable Deep Kernel Survival Analysis with an Accuracy Guarantee

by Chen, George H in Clusters , Data points , Datasets

2025

Kernel survival analysis models estimate individual survival distributions with the help of a kernel function, which measures the similarity between any two data points. Such a kernel function can be learned using deep kernel survival models. In this paper, we present a new deep kernel survival model called a survival kernet, which scales to large datasets in a manner that is amenable to model interpretation and also theoretical analysis. Specifically, the training data are partitioned into clusters based on a recently developed training set compression scheme for classification and regression called kernel netting that we extend to the survival analysis setting. At test time, each data point is represented as a weighted combination of these clusters, and each such cluster can be visualized. For a special case of survival kernets, we establish a finite-sample error bound on predicted survival distributions that is, up to a log factor, optimal. Whereas scalability at test time is achieved using the aforementioned kernel netting compression strategy, scalability during training is achieved by a warm-start procedure based on tree ensembles such as XGBoost and a heuristic approach to accelerating neural architecture search. On four standard survival analysis datasets of varying sizes (up to roughly 3 million data points), we show that survival kernets are highly competitive compared to various baselines tested in terms of time-dependent concordance index. Our code is available at: https://github.com/georgehc/survival-kernets

Paper

Share this book

Add to My Shelf

An Introduction to Deep Survival Analysis Models for Predicting Time-to-Event Outcomes

by Chen, George H in Data points , Design standards , Differential equations

2024

Many applications involve reasoning about time durations before a critical event happens--also called time-to-event outcomes. When will a customer cancel a subscription, a coma patient wake up, or a convicted criminal reoffend? Time-to-event outcomes have been studied extensively within the field of survival analysis primarily by the statistical, medical, and reliability engineering communities, with textbooks already available in the 1970s and '80s. This monograph aims to provide a reasonably self-contained modern introduction to survival analysis. We focus on predicting time-to-event outcomes at the individual data point level with the help of neural networks. Our goal is to provide the reader with a working understanding of precisely what the basic time-to-event prediction problem is, how it differs from standard regression and classification, and how key \"design patterns\" have been used time after time to derive new time-to-event prediction models, from classical methods like the Cox proportional hazards model to modern deep learning approaches such as deep kernel Kaplan-Meier estimators and neural ordinary differential equation models. We further delve into two extensions of the basic time-to-event prediction setup: predicting which of several critical events will happen first along with the time until this earliest event happens (the competing risks setting), and predicting time-to-event outcomes given a time series that grows in length over time (the dynamic setting). We conclude with a discussion of a variety of topics such as fairness, causal reasoning, interpretability, and statistical guarantees. Our monograph comes with an accompanying code repository that implements every model and evaluation metric that we cover in detail.

Paper

Share this book

Add to My Shelf

Deep Kernel Aalen-Johansen Estimator: An Interpretable and Flexible Neural Net Framework for Competing Risks

by Chen, George H , Shen, Xiaobin in Clusters , Data points , Kernel functions

2025

We propose an interpretable deep competing risks model called the Deep Kernel Aalen-Johansen (DKAJ) estimator, which generalizes the classical Aalen-Johansen nonparametric estimate of cumulative incidence functions (CIFs). Each data point (e.g., patient) is represented as a weighted combination of clusters. If a data point has nonzero weight only for one cluster, then its predicted CIFs correspond to those of the classical Aalen-Johansen estimator restricted to data points from that cluster. These weights come from an automatically learned kernel function that measures how similar any two data points are. On four standard competing risks datasets, we show that DKAJ is competitive with state-of-the-art baselines while being able to provide visualizations to assist model interpretation.

Paper

Share this book

Add to My Shelf

Survival Kernets: Scalable and Interpretable Deep Kernel Survival Analysis with an Accuracy Guarantee

by Chen, George H in Clusters , Data points , Datasets

2024

Kernel survival analysis models estimate individual survival distributions with the help of a kernel function, which measures the similarity between any two data points. Such a kernel function can be learned using deep kernel survival models. In this paper, we present a new deep kernel survival model called a survival kernet, which scales to large datasets in a manner that is amenable to model interpretation and also theoretical analysis. Specifically, the training data are partitioned into clusters based on a recently developed training set compression scheme for classification and regression called kernel netting that we extend to the survival analysis setting. At test time, each data point is represented as a weighted combination of these clusters, and each such cluster can be visualized. For a special case of survival kernets, we establish a finite-sample error bound on predicted survival distributions that is, up to a log factor, optimal. Whereas scalability at test time is achieved using the aforementioned kernel netting compression strategy, scalability during training is achieved by a warm-start procedure based on tree ensembles such as XGBoost and a heuristic approach to accelerating neural architecture search. On four standard survival analysis datasets of varying sizes (up to roughly 3 million data points), we show that survival kernets are highly competitive compared to various baselines tested in terms of time-dependent concordance index. Our code is available at: https://github.com/georgehc/survival-kernets

Paper

Share this book

Add to My Shelf

Explaining the Success of Nearest Neighbor Methods in Prediction

by Chen, George H , Shah, Devavrat in Case studies , Classification , Clustering

2025

Many modern methods for prediction leverage nearest neighbor search to find past training examples most similar to a test example, an idea that dates back in text to at least the 11th century and has stood the test of time. This monograph aims to explain the success of these methods, both in theory, for which we cover foundational nonasymptotic statistical guarantees on nearest-neighbor-based regression and classification, and in practice, for which we gather prominent methods for approximate nearest neighbor search that have been essential to scaling prediction systems reliant on nearest neighbor analysis to handle massive datasets. Furthermore, we discuss connections to learning distances for use with nearest neighbor methods, including how random decision trees and ensemble methods learn nearest neighbor structure, as well as recent developments in crowdsourcing and graphons. In terms of theory, our focus is on nonasymptotic statistical guarantees, which we state in the form of how many training data and what algorithm parameters ensure that a nearest neighbor prediction method achieves a user-specified error tolerance. We begin with the most general of such results for nearest neighbor and related kernel regression and classification in general metric spaces. In such settings in which we assume very little structure, what enables successful prediction is smoothness in the function being estimated for regression, and a low probability of landing near the decision boundary for classification. In practice, these conditions could be difficult to verify for a real dataset. We then cover recent guarantees on nearest neighbor prediction in the three case studies of time series forecasting, recommending products to people over time, and delineating human organs in medical images by looking at image patches. In these case studies, clustering structure enables successful prediction.

Paper

Share this book

Add to My Shelf

A General Framework for Visualizing Embedding Spaces of Neural Survival Analysis Models Based on Angular Information

by Chen, George H in Clustering , Embedding , Females

2023

We propose a general framework for visualizing any intermediate embedding representation used by any neural survival analysis model. Our framework is based on so-called anchor directions in an embedding space. We show how to estimate these anchor directions using clustering or, alternatively, using user-supplied \"concepts\" defined by collections of raw inputs (e.g., feature vectors all from female patients could encode the concept \"female\"). For tabular data, we present visualization strategies that reveal how anchor directions relate to raw clinical features and to survival time distributions. We then show how these visualization ideas extend to handling raw inputs that are images. Our framework is built on looking at angles between vectors in an embedding space, where there could be \"information loss\" by ignoring magnitude information. We show how this loss results in a \"clumping\" artifact that appears in our visualizations, and how to reduce this information loss in practice.

Paper

Share this book

Add to My Shelf

Fairness in Survival Analysis with Distributionally Robust Optimization

by Chen, George H , Hu, Shu in Artificial neural networks , Data points , Decomposition

2024

We propose a general approach for encouraging fairness in survival analysis models based on minimizing a worst-case error across all subpopulations that occur with at least a user-specified probability. This approach can be used to convert many existing survival analysis models into ones that simultaneously encourage fairness, without requiring the user to specify which attributes or features to treat as sensitive in the training loss function. From a technical standpoint, our approach applies recent developments of distributionally robust optimization (DRO) to survival analysis. The complication is that existing DRO theory uses a training loss function that decomposes across contributions of individual data points, i.e., any term that shows up in the loss function depends only on a single training point. This decomposition does not hold for commonly used survival loss functions, including for the Cox proportional hazards model, its deep neural network variants, and many other recently developed models that use loss functions involving ranking or similarity score calculations. We address this technical hurdle using a sample splitting strategy. We demonstrate our sample splitting DRO approach by using it to create fair versions of a diverse set of existing survival analysis models including the Cox model (and its deep variant DeepSurv), the discrete-time model DeepHit, and the neural ODE model SODEN. We also establish a finite-sample theoretical guarantee to show what our sample splitting DRO loss converges to. For the Cox model, we further derive an exact DRO approach that does not use sample splitting. For all the models that we convert into DRO variants, we show that the DRO variants often score better on recently established fairness metrics (without incurring a significant drop in accuracy) compared to existing survival analysis fairness regularization techniques.

Paper

Share this book

Add to My Shelf

Nearest Neighbor and Kernel Survival Analysis: Nonasymptotic Error Bounds and Strong Consistency Rates

by Chen, George H in Consistency , Error analysis , Estimators

2022

We establish the first nonasymptotic error bounds for Kaplan-Meier-based nearest neighbor and kernel survival probability estimators where feature vectors reside in metric spaces. Our bounds imply rates of strong consistency for these nonparametric estimators and, up to a log factor, match an existing lower bound for conditional CDF estimation. Our proof strategy also yields nonasymptotic guarantees for nearest neighbor and kernel variants of the Nelson-Aalen cumulative hazards estimator. We experimentally compare these methods on four datasets. We find that for the kernel survival estimator, a good choice of kernel is one learned using random survival forests.

Paper

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter