Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
131 result(s) for "Hothorn, Torsten"
Sort by:
Boosting Algorithms: Regularization, Prediction and Model Fitting
We present a statistical perspective on boosting. Special emphasis is given to estimating potentially complex parametric or nonparametric models, including generalized linear and additive models as well as regression models for survival analysis. Concepts of degrees of freedom and corresponding Akaike or Bayesian information criteria, particularly useful for regularization and variable selection in high-dimensional covariate spaces, are discussed as well. The practical aspects of boosting procedures for fitting statistical models are illustrated by means of the dedicated open-source software package mboost. This package implements functions which can be used for model fitting, prediction and variable selection. It is flexible, allowing for the implementation of new boosting algorithms optimizing user-specified loss functions.
Relationship of insect biomass and richness with land use along a climate gradient
Recently reported insect declines have raised both political and social concern. Although the declines have been attributed to land use and climate change, supporting evidence suffers from low taxonomic resolution, short time series, a focus on local scales, and the collinearity of the identified drivers. In this study, we conducted a systematic assessment of insect populations in southern Germany, which showed that differences in insect biomass and richness are highly context dependent. We found the largest difference in biomass between semi-natural and urban environments (−42%), whereas differences in total richness (−29%) and the richness of threatened species (−56%) were largest from semi-natural to agricultural environments. These results point to urbanization and agriculture as major drivers of decline. We also found that richness and biomass increase monotonously with increasing temperature, independent of habitat. The contrasting patterns of insect biomass and richness question the use of these indicators as mutual surrogates. Our study provides support for the implementation of more comprehensive measures aimed at habitat restoration in order to halt insect declines. Land use is a key control of insect communities. Here the authors investigate relationships of insect biomass and richness with land use along a climate gradient, finding evidence of urbanisation and agriculture as drivers of decline, and of biomass and species richness not being suitable as mutual surrogates.
Most Likely Transformations
We propose and study properties of maximum likelihood estimators in the class of conditional transformation models. Based on a suitable explicit parameterization of the unconditional or conditional transformation function, we establish a cascade of increasingly complex transformation models that can be estimated, compared and analysed in the maximum likelihood framework. Models for the unconditional or conditional distribution function of any univariate response variable can be set up and estimated in the same theoretical and computational framework simply by choosing an appropriate transformation function and parameterization thereof. The ability to evaluate the distribution function directly allows us to estimate models based on the exact likelihood, especially in the presence of random censoring or truncation. For discrete and continuous responses, we establish the asymptotic normality of the proposed estimators. A reference software implementation of maximum likelihood-based estimation for conditional transformation models that allows the same flexibility as the theory developed here was employed to illustrate the wide range of possible applications.
DISTRIBUTIONAL REGRESSION FORESTS FOR PROBABILISTIC PRECIPITATION FORECASTING IN COMPLEX TERRAIN
To obtain a probabilistic model for a dependent variable based on some set of explanatory variables, a distributional approach is often adopted where the parameters of the distribution are linked to regressors. In many classical models this only captures the location of the distribution but over the last decade there has been increasing interest in distributional regression approaches modeling all parameters including location, scale and shape. Notably, so-called nonhomogeneous Gaussian regression (NGR) models both mean and variance of a Gaussian response and is particularly popular in weather forecasting. Moreover, generalized additive models for location, scale and shape (GAMLSS) provide a framework where each distribution parameter is modeled separately capturing smooth linear or nonlinear effects. However, when variable selection is required and/or there are nonsmooth dependencies or interactions (especially unknown or of high-order), it is challenging to establish a good GAMLSS. A natural alternative in these situations would be the application of regression trees or random forests but, so far, no general distributional framework is available for these. Therefore, a framework for distributional regression trees and forests is proposed that blends regression trees and random forests with classical distributions from the GAMLSS framework as well as their censored or truncated counterparts. To illustrate these novel approaches in practice, they are employed to obtain probabilistic precipitation forecasts at numerous sites in a mountainous region (Tyrol, Austria) based on a large number of numerical weather prediction quantities. It is shown that the novel distributional regression forests automatically select variables and interactions, performing on par or often even better than GAMLSS specified either through prior meteorological knowledge or a computationally more demanding boosting approach.
A Robust Procedure for Comparing Multiple Means under Heteroscedasticity in Unbalanced Designs
Investigating differences between means of more than two groups or experimental conditions is a routine research question addressed in biology. In order to assess differences statistically, multiple comparison procedures are applied. The most prominent procedures of this type, the Dunnett and Tukey-Kramer test, control the probability of reporting at least one false positive result when the data are normally distributed and when the sample sizes and variances do not differ between groups. All three assumptions are non-realistic in biological research and any violation leads to an increased number of reported false positive results. Based on a general statistical framework for simultaneous inference and robust covariance estimators we propose a new statistical multiple comparison procedure for assessing multiple means. In contrast to the Dunnett or Tukey-Kramer tests, no assumptions regarding the distribution, sample sizes or variance homogeneity are necessary. The performance of the new procedure is assessed by means of its familywise error rate and power under different distributions. The practical merits are demonstrated by a reanalysis of fatty acid phenotypes of the bacterium Bacillus simplex from the \"Evolution Canyons\" I and II in Israel. The simulation results show that even under severely varying variances, the procedure controls the number of false positive findings very well. Thus, the here presented procedure works well under biologically realistic scenarios of unbalanced group sizes, non-normality and heteroscedasticity.
Generalized Maximally Selected Statistics
Maximally selected statistics for the estimation of simple cutpoint models are embedded into a generalized conceptual framework based on conditional inference procedures. This powerful framework contains most of the published procedures in this area as special cases, such as maximally selected χ² and rank statistics, but also allows for direct construction of new test procedures for less standard test problems. As an application, a novel maximally selected rank statistic is derived from this framework for a censored response partitioned with respect to two ordered categorical covariates and potential interactions. This new test is employed to search for a high-risk group of rectal cancer patients treated with a neo-adjuvant chemoradiotherapy. Moreover, a new efficient algorithm for the evaluation of the asymptotic distribution for a large class of maximally selected statistics is given enabling the fast evaluation of a large number of cutpoints.
Unbiased Recursive Partitioning: A Conditional Inference Framework
Recursive binary partitioning is a popular tool for regression analysis. Two fundamental problems of exhaustive search procedures usually applied to fit such models have been known for a long time: overfitting and a selection bias towards covariates with many possible splits or missing values. While pruning procedures are able to solve the overfitting problem, the variable selection bias still seriously affects the interpretability of tree-structured regression models. For some special cases unbiased procedures have been suggested, however lacking a common theoretical foundation. We propose a unified framework for recursive partitioning which embeds tree-structured regression models into a well defined theory of conditional inference procedures. Stopping criteria based on multiple test procedures are implemented and it is shown that the predictive performance of the resulting trees is as good as the performance of established exhaustive search procedures. It turns out that the partitions and therefore the models induced by both approaches are structurally different, confirming the need for an unbiased variable selection. Moreover, it is shown that the prediction accuracy of trees with early stopping is equivalent to the prediction accuracy of pruned trees with unbiased variable selection. The methodology presented here is applicable to all kinds of regression problems, including nominal, ordinal, numeric, censored as well as multivariate response variables and arbitrary measurement scales of the covariates. Data from studies on glaucoma classification, node positive breast cancer survival and mammography experience are re-analyzed.
Small beetle, large‐scale drivers: how regional and landscape factors affect outbreaks of the European spruce bark beetle
Unprecedented bark beetle outbreaks have been observed for a variety of forest ecosystems recently, and damage is expected to further intensify as a consequence of climate change. In Central Europe, the response of ecosystem management to increasing infestation risk has hitherto focused largely on the stand level, while the contingency of outbreak dynamics on large‐scale drivers remains poorly understood. To investigate how factors beyond the local scale contribute to the infestation risk from Ips typographus (Col., Scol.), we analysed drivers across seven orders of magnitude in scale (from 10³ to 10¹⁰ m²) over a 23‐year period, focusing on the Bavarian Forest National Park. Time‐discrete hazard modelling was used to account for local factors and temporal dependencies. Subsequently, beta regression was applied to determine the influence of regional and landscape factors, the latter characterized by means of graph theory. We found that in addition to stand variables, large‐scale drivers also strongly influenced bark beetle infestation risk. Outbreak waves were closely related to landscape‐scale connectedness of both host and beetle populations as well as to regional bark beetle infestation levels. Furthermore, regional summer drought was identified as an important trigger for infestation pulses. Large‐scale synchrony and connectivity are thus key drivers of the recently observed bark beetle outbreak in the area. Synthesis and applications. Our multiscale analysis provides evidence that the risk for biotic disturbances is highly dependent on drivers beyond the control of traditional stand‐scale management. This finding highlights the importance of fostering the ability to cope with and recover from disturbance. It furthermore suggests that a stronger consideration of landscape and regional processes is needed to address changing disturbance regimes in ecosystem management.
Radar vision in the mapping of forest biodiversity from space
Recent progress in remote sensing provides much-needed, large-scale spatio-temporal information on habitat structures important for biodiversity conservation. Here we examine the potential of a newly launched satellite-borne radar system (Sentinel-1) to map the biodiversity of twelve taxa across five temperate forest regions in central Europe. We show that the sensitivity of radar to habitat structure is similar to that of airborne laser scanning (ALS), the current gold standard in the measurement of forest structure. Our models of different facets of biodiversity reveal that radar performs as well as ALS; median R² over twelve taxa by ALS and radar are 0.51 and 0.57 respectively for the first non-metric multidimensional scaling axes representing assemblage composition. We further demonstrate the promising predictive ability of radar-derived data with external validation based on the species composition of birds and saproxylic beetles. Establishing new area-wide biodiversity monitoring by remote sensing will require the coupling of radar data to stratified and standardized collected local species data. Satellite-borne radar systems are promising tools to obtain spatial habitat data with complete geographic coverage. Here the authors show that freely available Sentinel-1 radar data perform as well as standard airborne laser scanning data for mapping biodiversity of 12 taxa across temperate forests in Germany.
Crowdsourced analysis of clinical trial data to predict amyotrophic lateral sclerosis progression
An open competition to predict the progression of amyotrophic lateral sclerosis (ALS, also known as Lou Gehrig's disease) disease from the largest database of ALS clinical trial data yields potential new biomarkers and algorithms that outperform human clinicians. Amyotrophic lateral sclerosis (ALS) is a fatal neurodegenerative disease with substantial heterogeneity in its clinical presentation. This makes diagnosis and effective treatment difficult, so better tools for estimating disease progression are needed. Here, we report results from the DREAM-Phil Bowen ALS Prediction Prize4Life challenge. In this crowdsourcing competition, competitors developed algorithms for the prediction of disease progression of 1,822 ALS patients from standardized, anonymized phase 2/3 clinical trials. The two best algorithms outperformed a method designed by the challenge organizers as well as predictions by ALS clinicians. We estimate that using both winning algorithms in future trial designs could reduce the required number of patients by at least 20%. The DREAM-Phil Bowen ALS Prediction Prize4Life challenge also identified several potential nonstandard predictors of disease progression including uric acid, creatinine and surprisingly, blood pressure, shedding light on ALS pathobiology. This analysis reveals the potential of a crowdsourcing competition that uses clinical trial data for accelerating ALS research and development.