Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
946
result(s) for
"Statistische Methodenlehre"
Sort by:
Influential Observations and Inference in Accounting Research
by
Leone, Andrew J.
,
Minutti-Meza, Miguel
,
Wasley, Charles E.
in
Accounting
,
Efficacy
,
Financial reporting
2019
Accounting studies often encounter observations with extreme values that can influence coefficient estimates and inferences. Two widely used approaches to address influential observations in accounting studies are winsorization and truncation. While expedient, both depend on researcher-selected cutoffs, applied on a variable-by-variable basis, which, unfortunately, can alter legitimate data points. We compare the efficacy of winsorization, truncation, influence diagnostics (Cook's Distance), and robust regression at identifying influential observations. Replication of three published accounting studies shows that the choice impacts estimates and inferences. Simulation evidence shows that winsorization and truncation are ineffective at identifying influential observations. While influence diagnostics and robust regression both outperform winsorization and truncation, overall, robust regression outperforms the other methods. Since robust regression is a theoretically appealing and easily implementable approach based on a model's residuals, we recommend that future accounting studies consider using robust regression, or at least report sensitivity tests using robust regression.
Journal Article
Too Big to Fail: Large Samples and the p-Value Problem
by
Lin, Mingfeng
,
Lucas, Henry C
,
Shmueli, Galit
in
Confidence intervals
,
Electronic commerce
,
Hypotheses
2013
The Internet has provided IS researchers with the opportunity to conduct studies with extremely large samples, frequently well over 10,000 observations. There are many advantages to large samples, but researchers using statistical inference must be aware of the p-value problem associated with them. In very large samples, p-values go quickly to zero, and solely relying on p-values can lead the researcher to claim support for results of no practical significance. In a survey of large sample IS research, we found that a significant number of papers rely on a low p-value and the sign of a regression coefficient alone to support their hypotheses. This research commentary recommends a series of actions the researcher can take to mitigate the p-value problem in large samples and illustrates them with an example of over 300,000 camera sales on eBay. We believe that addressing the p-value problem will increase the credibility of large sample IS research as well as provide more insights for readers. [PUBLICATION ABSTRACT]
Journal Article
Statistical Inference with PLSc Using Bootstrap Confidence Intervals
by
Rönkkö, Mikko
,
Aguirre-Urreta, Miguel I.
in
Bootstrap method
,
Confidence intervals
,
Estimation bias
2018
Partial least squares (PLS) is one of the most popular statistical techniques in use in the Information Systems field. When applied to data originating from a common factor model, as is often the case in the discipline, PLS will produce biased estimates. A recent development, consistent PLS (PLSc), has been introduced to correct for this bias. In addition, the common practice in PLS of comparing the ratio of an estimate to its standard error to a t distribution for the purposes of statistical inference has also been challenged. We contribute to the practice of research in the IS discipline by providing evidence of the value of employing bootstrap confidence intervals in conjunction with PLSc, which is a more appropriate alternative than PLS for many of the research scenarios that are of interest to the field. Such evidence is direly needed before a complete approach to the estimation of SEM that relies on both PLSc and bootstrap CIs can be widely adopted. We also provide recommendations for researchers on the use of confidence intervals with PLSc.
Journal Article
Near-Optimal A-B Testing
2020
We consider the problem of A-B testing when the impact of the treatment is marred by a large number of covariates. Randomization can be highly inefficient in such settings, and thus we consider the problem of optimally allocating test subjects to either treatment with a view to maximizing the precision of our estimate of the treatment effect. Our main contribution is a tractable algorithm for this problem in the online setting, where subjects arrive, and must be assigned, sequentially, with covariates drawn from an elliptical distribution with finite second moment. We further characterize the gain in precision afforded by optimized allocations relative to randomized allocations, and show that this gain grows large as the number of covariates grows. Our dynamic optimization framework admits several generalizations that incorporate important operational constraints such as the consideration of selection bias, budgets on allocations, and endogenous stopping times. In a set of numerical experiments, we demonstrate that our method simultaneously offers better statistical efficiency and less selection bias than state-of-the-art competing biased coin designs.
Journal Article
The proportion for splitting data into training and test set for the bootstrap in classification problems
2021
Background: The bootstrap can be alternative to cross-validation as a training/test set splitting method since it minimizes the computing time in classification problems in comparison to the tenfold cross-validation. Objectives: This research investigates what proportion should be used to split the dataset into the training and the testing set so that the bootstrap might be competitive in terms of accuracy to other resampling methods. Methods/Approach: Different train/test split proportions are used with the following resampling methods: the bootstrap, the leave-one-out cross-validation, the tenfold cross-validation, and the random repeated train/test split to test their performance on several classification methods. The classification methods used include the logistic regression, the decision tree, and the k-nearest neighbours. Results: The findings suggest that using a different structure of the test set (e.g. 30/70, 20/80) can further optimize the performance of the bootstrap when applied to the logistic regression and the decision tree. For the k-nearest neighbour, the tenfold cross-validation with a 70/30 train/test splitting ratio is recommended. Conclusions: Depending on the characteristics and the preliminary transformations of the variables, the bootstrap can improve the accuracy of the classification problem.
Journal Article
RANDOMIZATION TESTS UNDER AN APPROXIMATE SYMMETRY ASSUMPTION
2017
This paper develops a theory of randomization tests under an approximate symmetry assumption. Randomization tests provide a general means of constructing tests that control size in finite samples whenever the distribution of the observed data exhibits symmetry under the null hypothesis. Here, by exhibits symmetry we mean that the distribution remains invariant under a group of transformations. In this paper, we provide conditions under which the same construction can be used to construct tests that asymptotically control the probability of a false rejection whenever the distribution of the observed data exhibits approximate symmetry in the sense that the limiting distribution of a function of the data exhibits symmetry under the null hypothesis. An important application of this idea is in settings where the data may be grouped into a fixed number of \"clusters\" with a large number of observations within each cluster. In such settings, we show that the distribution of the observed data satisfies our approximate symmetry requirement under weak assumptions. In particular, our results allow for the clusters to be heterogeneous and also have dependence not only within each cluster, but also across clusters. This approach enjoys several advantages over other approaches in these settings.
Journal Article
Uncertain hypothesis test with application to uncertain regression analysis
2022
This paper first establishes uncertain hypothesis test as a mathematical tool that uses uncertainty theory to help people rationally judge whether some hypotheses are correct or not, according to observed data. As an application, uncertain hypothesis test is employed in uncertain regression analysis to test whether the estimated disturbance term and the fitted regression model are appropriate. In order to illustrate the test process, some numerical examples are documented.
Journal Article
Mind the Gap: Accounting for Measurement Error and Misclassification in Variables Generated via Data Mining
by
Yang, Mochen
,
Adomavicius, Gediminas
,
Burtch, Gordon
in
Analysis
,
Data mining
,
Econometric models
2018
The application of predictive data mining techniques in information systems research has grown in recent years, likely because of their effectiveness and scalability in extracting information from large amounts of data. A number of scholars have sought to combine data mining with traditional econometric analyses. Typically, data mining methods are first used to generate new variables (e.g., text sentiment), which are added into subsequent econometric models as independent regressors. However, because prediction is almost always imperfect, variables generated from the first-stage data mining models inevitably contain measurement error or misclassification. These errors, if ignored, can introduce systematic biases into the second-stage econometric estimations and threaten the validity of statistical inference. In this commentary, we examine the nature of this bias, both analytically and empirically, and show that it can be severe even when data mining models exhibit relatively high performance. We then show that this bias becomes increasingly difficult to anticipate as the functional form of the measurement error or the specification of the econometric model grows more complex. We review several methods for error correction and focus on two simulation-based methods, SIMEX and MC-SIMEX, which can be easily parameterized using standard performance metrics from data mining models, such as error variance or the confusion matrix, and can be applied under a wide range of econometric specifications. Finally, we demonstrate the effectiveness of SIMEX and MC-SIMEX by simulations and subsequent application of the methods to econometric estimations employing variables mined from three real-world data sets related to travel, social networking, and crowdfunding campaign websites.
The online appendix is available at
https://doi.org/10.1287/isre.2017.0727
.
Journal Article
Near-Optimal A-B Testing
2020
We consider the problem of A-B testing when the impact of the treatment is marred by a large number of covariates. Randomization can be highly inefficient in such settings, and thus we consider the problem of optimally allocating test subjects to either treatment with a view to maximizing the precision of our estimate of the treatment effect. Our main contribution is a tractable algorithm for this problem in the online setting, where subjects arrive, and must be assigned, sequentially, with covariates drawn from an elliptical distribution with finite second moment. We further characterize the gain in precision afforded by optimized allocations relative to randomized allocations, and show that this gain grows large as the number of covariates grows. Our dynamic optimization framework admits several generalizations that incorporate important operational constraints such as the consideration of selection bias, budgets on allocations, and endogenous stopping times. In a set of numerical experiments, we demonstrate that our method simultaneously offers better statistical efficiency and less selection bias than state-of-the-art competing biased coin designs.
This paper was accepted by Noah Gans, stochastic models and simulation
.
Journal Article
A Robust Test for Weak Instruments
by
Pflueger, Carolin
,
Olea, José Luis Montiel
in
Autocorrelation
,
Clustered
,
Correlation analysis
2013
We develop a test for weak instruments in linear instrumental variables regression that is robust to heteroscedasticity, autocorrelation, and clustering. Our test statistic is a scaled nonrobust first-stage F statistic. Instruments are considered weak when the two-stage least squares or the limited information maximum likelihood Nagar bias is large relative to a benchmark. We apply our procedures to the estimation of the elasticity of intertemporal substitution, where our test cannot reject the null of weak instruments in a larger number of countries than the test proposed by Stock and Yogo in
2005
. Supplementary materials for this article are available online.
Journal Article