Catalogue Search | MBRL

Measurement error and the replication crisis

by Loken, Eric , Gelman, Andrew in Conveying , Error analysis , Errors

2017

The assumption that measurement error always reduces effect sizes is false Measurement error adds noise to predictions, increases uncertainty in parameter estimates, and makes it more difficult to discover new phenomena or to distinguish among competing theories. A common view is that any study finding an effect under noisy conditions provides evidence that the underlying effect is particularly strong and robust. Yet, statistical significance conveys very little information when measurements are noisy. In noisy research settings, poor measurement can contribute to exaggerated estimates of effect size. This problem and related misunderstandings are key components in a feedback loop that perpetuates the replication crisis in science.

Journal Article

Share this book

Add to My Shelf

Machine Learning–Derived Severe Weather Probabilities from a Warn-on-Forecast System

by Clark, Adam J. , Loken, Eric D. in Algorithms , Convection , Data assimilation

2022

Severe weather probabilities are derived from the Warn-on-Forecast System (WoFS) run by NOAA’s National Severe Storms Laboratory (NSSL) during spring 2018 using the random forest (RF) machine learning algorithm. Recent work has shown this method generates skillful and reliable forecasts when applied to convection-allowing model ensembles for the “Day 1” time range (i.e., 12–36-h lead times), but it has been tested in only one other study for lead times relevant to WoFS (e.g., 0–6 h). Thus, in this paper, various sets of WoFS predictors, which include both environment and storm-based fields, are input into a RF algorithm and trained using the occurrence of severe weather reports within 39 km of a point to produce severe weather probabilities at 0–3-h lead times. We analyze the skill and reliability of these forecasts, sensitivity to different sets of predictors, and avenues for further improvements. The RF algorithm produced very skillful and reliable severe weather probabilities and significantly outperformed baseline probabilities calculated by finding the best performing updraft helicity (UH) threshold and smoothing parameter. Experiments where different sets of predictors were used to derive RF probabilities revealed 1) storm attribute fields contributed significantly more skill than environmental fields, 2) 2–5 km AGL UH and maximum updraft speed were the best performing storm attribute fields, 3) the most skillful ensemble summary metric was a smoothed mean, and 4) the most skillful forecasts were obtained when smoothed UH from individual ensemble members were used as predictors.

Journal Article

Share this book

Add to My Shelf

Comparing and Interpreting Differently Designed Random Forests for Next-Day Severe Weather Hazard Prediction

by McGovern, Amy , Clark, Adam J. , Loken, Eric D. in Convection , Ensemble forecasting , Forecasting data

2022

Recent research has shown that random forests (RFs) can create skillful probabilistic severe weather hazard forecasts from numerical weather prediction (NWP) ensemble data. However, it remains unclear how RFs use NWP data and how predictors should be generated from NWP ensembles. This paper compares two methods for creating RFs for next-day severe weather prediction using simulated forecast data from the convection-allowing High-Resolution Ensemble Forecast System, version 2.1 (HREFv2.1). The first method uses predictors from individual ensemble members (IM) at the point of prediction, while the second uses ensemble mean (EM) predictors at multiple spatial points. IM and EM RFs are trained with all predictors as well as predictor subsets, and the Python module tree interpreter (TI) is used to assess RF variable importance and the relationships learned by the RFs. Results show that EM RFs have better objective skill compared to similarly configured IM RFs for all hazards, presumably because EM predictors contain less noise. In both IM and EM RFs, storm variables are found to be most important, followed by index and environment variables. Interestingly, RFs created from storm and index variables tend to produce forecasts with greater or equal skill than those from the all-predictor RFs. TI analysis shows that the RFs emphasize different predictors for different hazards in a way that makes physical sense. Further, TI shows that RFs create calibrated hazard probabilities based on complex, multivariate relationships that go well beyond thresholding 2–5-km updraft helicity.

Journal Article

Share this book

Add to My Shelf

Generating Probabilistic Next-Day Severe Weather Forecasts from Convection-Allowing Ensembles Using Random Forests

by Karstens, Christopher D. , Clark, Adam J. , Loken, Eric D. in Convection , Decision making , Decision trees

2020

Extracting explicit severe weather forecast guidance from convection-allowing ensembles (CAEs) is challenging since CAEs cannot directly simulate individual severe weather hazards. Currently, CAE-based severe weather probabilities must be inferred from one or more storm-related variables, which may require extensive calibration and/or contain limited information. Machine learning (ML) offers a way to obtain severe weather forecast probabilities from CAEs by relating CAE forecast variables to observed severe weather reports. This paper develops and verifies a random forest (RF)-based ML method for creating day 1 (1200–1200 UTC) severe weather hazard probabilities and categorical outlooks based on 0000 UTC Storm-Scale Ensemble of Opportunity (SSEO) forecast data and observed Storm Prediction Center (SPC) storm reports. RF forecast probabilities are compared against severe weather forecasts from calibrated SSEO 2–5-km updraft helicity (UH) forecasts and SPC convective outlooks issued at 0600 UTC. Continuous RF probabilities routinely have the highest Brier skill scores (BSSs), regardless of whether the forecasts are evaluated over the full domain or regional/seasonal subsets. Even when RF probabilities are truncated at the probability levels issued by the SPC, the RF forecasts often have BSSs better than or comparable to corresponding UH and SPC forecasts. Relative to the UH and SPC forecasts, the RF approach performs best for severe wind and hail prediction during the spring and summer (i.e., March–August). Overall, it is concluded that the RF method presented here provides skillful, reliable CAE-derived severe weather probabilities that may be useful to severe weather forecasters and decision-makers.

Journal Article

Share this book

Add to My Shelf

Spread and Skill in Mixed- and Single-Physics Convection-Allowing Ensembles

by Xue, Ming , Kong, Fanyou , Loken, Eric D. in Bias , Boundary conditions , Convection

2019

Spread and skill of mixed- and single-physics convection-allowing ensemble forecasts that share the same set of perturbed initial and lateral boundary conditions are investigated at a variety of spatial scales. Forecast spread is assessed for 2-m temperature, 2-m dewpoint, 500-hPa geopotential height, and hourly accumulated precipitation both before and after a bias-correction procedure is applied. Time series indicate that the mixed-physics ensemble forecasts generally have greater variance than comparable single-physics forecasts. While the differences tend to be small, they are greatest at the smallest spatial scales and when the ensembles are not calibrated for bias. Although differences between the mixed- and single-physics ensemble variances are smaller for the larger spatial scales, variance ratios suggest that the mixed-physics ensemble generates more spread relative to the single-physics ensemble at larger spatial scales. Forecast skill is evaluated for 2-m temperature, dewpoint temperature, and bias-corrected 6-h accumulated precipitation. The mixed-physics ensemble generally has lower 2-m temperature and dewpoint root-mean-square error (RMSE) compared to the single-physics ensemble. However, little difference in skill or reliability is found between the mixed- and single-physics bias-corrected precipitation forecasts. Overall, given that mixed- and single-physics ensembles have similar spread and skill, developers may prefer to implement single- as opposed to mixed-physics convection-allowing ensembles in future operational systems, while accounting for model error using stochastic methods.

Journal Article

Share this book

Add to My Shelf

Human face recognition ability is specific and highly heritable

by Wilmer, Jeremy B. , Williams, Mark , Nakayama, Ken in Adolescent , Adult , Behavioral genetics

2010

Compared with notable successes in the genetics of basic sensory transduction, progress on the genetics of higher level perception and cognition has been limited. We propose that investigating specific cognitive abilities with well-defined neural substrates, such as face recognition, may yield additional insights. In a twin study of face recognition, we found that the correlation of scores between monozygotic twins (0.70) was more than double the dizygotic twin correlation (0.29), evidence for a high genetic contribution to face recognition ability. Low correlations between face recognition scores and visual and verbal recognition scores indicate that both face recognition ability itself and its genetic basis are largely attributable to face-specific mechanisms. The present results therefore identify an unusual phenomenon: a highly specific cognitive ability that is highly heritable. Our results establish a clear genetic basis for face recognition, opening this intensively studied and socially advantageous cognitive trait to genetic investigation.

Journal Article

Share this book

Add to My Shelf

The Statistical Crisis in Science

by Gelman, Andrew , Loken, Eric in Data analysis , Datasets , Degrees of freedom

2014

[...]on: A single overarching research hypothesisin this case, the idea that issue context interacts with political partisanship to affect mathematical problem-solving skills-corresponds to many possible choices of a decision variable. How to Test a Hypothesis In general, we could think of four classes of procedures for hypothesis testing: (1) a simple classical test based on a unique test statistic, T, which when applied to the observed data yields T(y), where y represents the data; (2) a classical test prechosen from a set of possible tests, yielding T(y; p), with preregistered (p (for example, cp might correspond to choices of control variables in a regression, transformations, the decision of which main effect or interaction to focus on); (3) researcher degrees of freedom without fishing, which consists of computing a single test based on the data, but in an environment where a different test would have been performed given different data; the result of such a course is T(y; tp(y)), where the function c(>>) is observed in the observed case. In the hypothetical example presented earlier, finding a difference in the healthcare context might be taken as evidence that that is the most important context in which to explore differences.\\n One can follow up an open-ended analysis with prepublication replication, which is related to the idea of external validation, popular in statistics and computer science.

Journal Article

Share this book

Add to My Shelf

Postprocessing Next-Day Ensemble Probabilistic Precipitation Forecasts Using Random Forests

by McGovern, Amy , Flora, Montgomery , Loken, Eric D. in Algorithms , Ensemble forecasting , Lead time

2019

Most ensembles suffer from underdispersion and systematic biases. One way to correct for these shortcomings is via machine learning (ML), which is advantageous due to its ability to identify and correct nonlinear biases. This study uses a single random forest (RF) to calibrate next-day (i.e., 12–36-h lead time) probabilistic precipitation forecasts over the contiguous United States (CONUS) from the Short-Range Ensemble Forecast System (SREF) with 16-km grid spacing and the High-Resolution Ensemble Forecast version 2 (HREFv2) with 3-km grid spacing. Random forest forecast probabilities (RFFPs) from each ensemble are compared against raw ensemble probabilities over 496 days from April 2017 to November 2018 using 16-fold cross validation. RFFPs are also compared against spatially smoothed ensemble probabilities since the raw SREF and HREFv2 probabilities are overconfident and undersample the true forecast probability density function. Probabilistic precipitation forecasts are evaluated at four precipitation thresholds ranging from 0.1 to 3 in. In general, RFFPs are found to have better forecast reliability and resolution, fewer spatial biases, and significantly greater Brier skill scores and areas under the relative operating characteristic curve compared to corresponding raw and spatially smoothed ensemble probabilities. The RFFPs perform best at the lower thresholds, which have a greater observed climatological frequency. Additionally, the RF-based postprocessing technique benefits the SREF more than the HREFv2, likely because the raw SREF forecasts contain more systematic biases than those from the raw HREFv2. It is concluded that the RFFPs provide a convenient, skillful summary of calibrated ensemble output and are computationally feasible to implement in real time. Advantages and disadvantages of ML-based postprocessing techniques are discussed.

Journal Article

Share this book

Add to My Shelf