Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
      More Filters
      Clear All
      More Filters
      Source
    • Language
45,139 result(s) for "statistical modelling"
Sort by:
On information quality
We define the concept of information quality 'InfoQ' as the potential of a data set to achieve a specific (scientific or practical) goal by using a given empirical analysis method. InfoQ is different from data quality and analysis quality, but is dependent on these components and on the relationship between them. We survey statistical methods for increasing InfoQ at the study design and post-data-collection stages, and we consider them relatively to what we define as InfoQ. We propose eight dimensions that help to assess InfoQ: data resolution, data structure, data integration, temporal relevance, generalizability, chronology of data and goal, construct operationalization and communication. We demonstrate the concept of InfoQ, its components (what it is) and assessment (how it is achieved) through three case-studies in on-line auctions research. We suggest that formalizing the concept of InfoQ can help to increase the value of statistical analysis, and data mining both methodologically and practically, thus contributing to a general theory of applied statistics.
Climate Change May Alter Rainfall‐Partitioning in Ways Unlikely To Be Detected in the Coming Decades
Droughts around the world have resulted in less annual streamflow relative to annual rainfall. The decrease in streamflow cannot be explained by changes in land‐use, precipitation or temperature alone. Rather, the decrease in annual streamflow has been linked to changes in annual rainfall‐partitioning. Climate change is predicted to induce similar, if not worse conditions than previous droughts. This could make the disproportionate shifts in annual streamflow more frequent or intense. Currently, most methods assume annual rainfall‐partitioning remains constant and thus are unlikely to accurately predict how climate change could impact streamflow. Our aim is to conduct a thought‐experiment to examine the various ways climate change could alter annual rainfall‐partitioning and to determine what types of changes in rainfall‐partitioning are likely to be statistically detectable in the coming decades. Using a synthetic streamflow model, we show how changing the rainfall‐runoff intercept, rainfall‐runoff slope, lag‐1 autocorrelation, standard deviation per rainfall depth and, skewness per rainfall depth alters rainfall‐partitioning. Of all the rainfall‐partitioning changes examined only the rainfall‐runoff intercept and slope are likely to be statistically detectable in the coming decades. Changes impacting lag‐1 autocorrelation of annual streamflow, standard deviation per rainfall depth and skewness per rainfall depth are unlikely to be statistically detectable in the coming decades. Although being unlikely to be statistically identified, these changes still alter both the rainfall‐runoff and streamflow‐time relationships. Rainfall‐partitioning changes that alter streamflow and are unlikely to be statistically detected in the coming decades will make prediction and allocation of water resources more challenging.
Integrated statistical modeling method: part I—statistical simulations for symmetric distributions
The use of parametric and nonparametric statistical modeling methods differs depending on data sufficiency. For sufficient data, the parametric statistical modeling method is preferred owing to its high convergence to the population distribution. Conversely, for insufficient data, the nonparametric method is preferred owing to its high flexibility and conservative modeling of the given data. However, it is difficult for users to select either a parametric or nonparametric modeling method because the adequacy of using one of these methods depends on how well the given data represent the population model, which is unknown to users. For insufficient data or limited prior information on random variables, the interval approach, which uses interval information of data or random variables, can be used. However, it is still difficult to be used in uncertainty analysis and design, owing to imprecise probabilities. In this study, to overcome this problem, an integrated statistical modeling (ISM) method, which combines the parametric, nonparametric, and interval approaches, is proposed. The ISM method uses the two-sample Kolmogorov–Smirnov (K–S) test to determine whether to use either the parametric or nonparametric method according to data sufficiency. The sequential statistical modeling (SSM) and kernel density estimation with estimated bounded data (KDE-ebd) are used as the parametric and nonparametric methods combined with the interval approach, respectively. To verify the modeling accuracy, conservativeness, and convergence of the proposed method, it is compared with the original SSM and KDE-ebd according to various sample sizes and distribution types in simulation tests. Through an engineering and reliability analysis example, it is shown that the proposed ISM method has the highest accuracy and reliability in the statistical modeling, regardless of data sufficiency. The ISM method is applicable to real engineering data and is conservative in the reliability analysis for insufficient data, unlike the SSM, and converges to an exact probability of failure more rapidly than KDE-ebd as data increase.
Linking niche theory to ecological impacts of successful invaders: insights from resource fluctuation-specialist herbivore interactions
Theories of species coexistence and invasion ecology are fundamentally connected and provide a common theoretical framework for studying the mechanisms underlying successful invasions and their ecological impacts. Temporal fluctuations in resource availability and differences in life-history traits between invasive and resident species are considered as likely drivers of the dynamics of invaded communities. Current critical issues in invasion ecology thus relate to the extent to which such mechanisms influence coexistence between invasive and resident species and to the ability of resident species to persist in an invasive-dominated ecosystem. We tested how a fluctuating resource, and species trait differences may explain and help predict long-term impacts of biological invasions in forest specialist insect communities. We used a simple invasion system comprising closely related invasive and resident seed-specialized wasps (Hymenoptera: Torymidae) competing for a well-known fluctuating resource and displaying divergent diapause, reproductive and phenological traits. Based on extensive long-term field observations (1977–2010), we developed a combination of mechanistic and statistical models aiming to (i) obtain a realistic description of the population dynamics of these interacting species over time, and (ii) clarify the respective contributions of fluctuation-dependent and fluctuation-independent mechanisms to long-term impact of invasion on the population dynamics of the resident wasp species. We showed that a fluctuation-dependent mechanism was unable to promote coexistence of the resident and invasive species. Earlier phenology of the invasive species was the main driver of invasion success, enabling the invader to exploit an empty niche. Phenology also had the greatest power to explain the long-term negative impact of the invasive on the resident species, through resource pre-emption. This study provides strong support for the critical role of species differences in interspecific competition outcomes within animal communities. Our mechanistic-statistical approach disentangles the critical drivers of novel species assemblages resulting from intentional and nonintentional introductions of non-native species
3D Analysis of the Proximal Femur Compared to 2D Analysis for Hip Fracture Risk Prediction in a Clinical Population
Due to the adverse impacts of hip fractures on patients’ lives, it is crucial to enhance the identification of people at high risk through accessible clinical techniques. Reconstructing the 3D geometry and BMD distribution of the proximal femur could be beneficial in enhancing hip fracture risk predictions; however, it is associated with a high computational burden. It is also not clear whether it provides a better performance than 2D model analysis. Therefore, the purpose of this study was to compare the 2D and 3D model reconstruction’s ability to predict hip fracture risk in a clinical population of patients. The DXA scans and CT scans of 16 cadaveric femurs were used to create training sets for the 2D and 3D model reconstruction based on statistical shape and appearance modeling. Subsequently, these methods were used to predict the risk of sustaining a hip fracture in a clinical population of 150 subjects (50 fractured, and 100 non-fractured) that were monitored for five years in the Canadian Multicentre Osteoporosis Study. 3D model reconstruction was able to improve the identification of patients who sustained a hip fracture more accurately than the standard clinical practice (by 40%). Also, the predictions from the 2D statistical model didn’t differ significantly from the 3D ones (p > 0.76). These results indicated that to enhance hip fracture risk prediction in clinical practice implementing 2D statistical modeling has comparable performance with lower associated computational load.
A Review of Electric Vehicle Load Open Data and Models
The field of electric vehicle charging load modelling has been growing rapidly in the last decade. In light of the Paris Agreement, it is crucial to keep encouraging better modelling techniques for successful electric vehicle adoption. Additionally, numerous papers highlight the lack of charging station data available in order to build models that are consistent with reality. In this context, the purpose of this article is threefold. First, to provide the reader with an overview of the open datasets available and ready to be used in order to foster reproducible research in the field. Second, to review electric vehicle charging load models with their strengths and weaknesses. Third, to provide suggestions on matching the models reviewed to six datasets found in this research that have not previously been explored in the literature. The open data search covered more than 860 repositories and yielded around 60 datasets that are relevant for modelling electric vehicle charging load. These datasets include information on charging point locations, historical and real-time charging sessions, traffic counts, travel surveys and registered vehicles. The models reviewed range from statistical characterization to stochastic processes and machine learning and the context of their application is assessed.
Probabilistic reconstructions of local temperature and soil moisture from tree-ring data with potentially time-varying climatic response
We explore a probabilistic, hierarchical Bayesian approach to the simultaneous reconstruction of local temperature and soil moisture from tree-ring width observations. The model explicitly allows for differing calibration and reconstruction interval responses of the ring-width series to climate due to slow changes in climatology coupled with the biological climate thresholds underlying tree-ring growth. A numerical experiment performed using synthetically generated data demonstrates that bimodality can occur in posterior estimates of past climate when the data do not contain enough information to determine whether temperature or moisture limitation controlled reconstruction-interval tree-ring variability. This manifestation of nonidentifiability is a result of the many-to-one mapping from bivariate climate to time series of tree-ring widths. The methodology is applied to reconstruct temperature and soil moisture conditions over the 1080–1129 C.E. interval at Methusalah Walk in the White Mountains of California, where co-located isotopic dendrochronologies suggest that observed moisture limitations on tree growth may have been alleviated. Our model allows for assimilation of both data sources, and computation of the probability of a change in the climatic controls on ring-width relative to those observed in the calibration period. While the probability of a change in control is sensitive to the choice of prior distribution, the inference that conditions were moist and cool at Methuselah Walk during the 1080–1129 C.E. interval is robust. Results also illustrate the power of combining multiple proxy data sets to reduce uncertainty in reconstructions of paleoclimate.
Alternative measures to evaluate the accuracy and bias of genomic predictions with censored records
This study aimed to propose and compare metrics of accuracy and bias of genomic prediction of breeding values for traits with censored data. Genotypic and censored-phenotypic information were simulated for four traits with QTL heritability and polygenic heritability, respectively: C1: 0.07-0.07, C2: 0.07-0.00, C3: 0.27-0.27, and C4: 0.27-0.00. Genomic breeding values were predicted using the Mixed Cox and Truncated Normal models. The accuracy of the models was estimated based on the Pearson (PC), maximal (MC), and Pearson correlation for censored data (PCC) while the genomic bias was calculated via simple linear regression (SLR) and Tobit (TB). MC and PCC were statistically superior to PC for the trait C3 with 10 and 40% censored information, for 70% censorship, PCC yielded better results than MC and PC. For the other traits, the proposed measures were superior or statistically equal to the PC. The coefficients associated with the marginal effects (TB) presented estimates close to those obtained for the SLR method, while the coefficient related to the latent variable showed almost unchanged pattern with the increase in censorship in most cases. From a statistical point of view, the use of methodologies for censored data should be prioritized, even for low censoring percentages.
Assessment of the Impacts of Urbanization on Landslide Susceptibility in Hakha City, a Mountainous Region of Western Myanmar
In July 2015, more than 100 landslides caused by Cyclone Komen resulted in damage to approximately 1000 buildings in the mountainous region of Hakha City, Myanmar. This study aimed to identify potential landslide susceptibility for newly developed resettlement areas in Hakha City before and after urbanization. The study evaluated landslide susceptibility through statistical modeling and compared the level of susceptibility before and after urbanization in the region. The information value model was used to predict landslide susceptibility before and after urbanization, using 10 parameter maps as independent variables and 1 landslide inventory map as the dependent variable. Four landslide types were identified in the study area: shallow earth slide, deep slide, earth slump, and debris flow. Susceptibility analyses were conducted separately for each type to better recognize the different aspects of landslide susceptibility in planned urban areas. By comparing the results of the susceptibility index before and after urbanization, suitable urban areas with lower landslide susceptibility could be identified. The results showed that high-potential landslide susceptibility increased by 10%, 16%, and 5% after urbanization compared with before urbanization in three Town Plans, respectively. Therefore, Town Plan 3 is selected as the most suitable location for the resettlement area in terms of low risk of landslides.
What to Do When K-Means Clustering Fails: A Simple yet Principled Alternative Algorithm
The K-means algorithm is one of the most popular clustering algorithms in current use as it is relatively fast yet simple to understand and deploy in practice. Nevertheless, its use entails certain restrictive assumptions about the data, the negative consequences of which are not always immediately apparent, as we demonstrate. While more flexible algorithms have been developed, their widespread use has been hindered by their computational and technical complexity. Motivated by these considerations, we present a flexible alternative to K-means that relaxes most of the assumptions, whilst remaining almost as fast and simple. This novel algorithm which we call MAP-DP (maximum a-posteriori Dirichlet process mixtures), is statistically rigorous as it is based on nonparametric Bayesian Dirichlet process mixture modeling. This approach allows us to overcome most of the limitations imposed by K-means. The number of clusters K is estimated from the data instead of being fixed a-priori as in K-means. In addition, while K-means is restricted to continuous data, the MAP-DP framework can be applied to many kinds of data, for example, binary, count or ordinal data. Also, it can efficiently separate outliers from the data. This additional flexibility does not incur a significant computational overhead compared to K-means with MAP-DP convergence typically achieved in the order of seconds for many practical problems. Finally, in contrast to K-means, since the algorithm is based on an underlying statistical model, the MAP-DP framework can deal with missing data and enables model testing such as cross validation in a principled way. We demonstrate the simplicity and effectiveness of this algorithm on the health informatics problem of clinical sub-typing in a cluster of diseases known as parkinsonism.