Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
      More Filters
      Clear All
      More Filters
      Source
    • Language
3,970 result(s) for "variable ranking"
Sort by:
VARIABLE SELECTION AND PREDICTION WITH INCOMPLETE HIGH-DIMENSIONAL DATA
We propose a Multiple Imputation Random Lasso (MIRL) method to select important variables and to predict the outcome for an epidemiological study of Eating and Activity in Teens. In this study 80% of individuals have at least one variable missing. Therefore, using variable selection methods developed for complete data after listwise deletion substantially reduces prediction power. Recent work on prediction models in the presence of incomplete data cannot adequately account for large numbers of variables with arbitrary missing patterns. We propose MIRL to combine penalized regression techniques with multiple imputation and stability selection. Extensive simulation studies are conducted to compare MIRL with several alternatives. MIRL outperforms other methods in high-dimensional scenarios in terms of both reduced prediction error and improved variable selection performance, and it has greater advantage when the correlation among variables is high and missing proportion is high. MIRL is shown to have improved performance when comparing with other applicable methods when applied to the study of Eating and Activity in Teens for the boys and girls separately, and to a subgroup of low social economic status (SES) Asian boys who are at high risk of developing obesity.
WiBB: an integrated method for quantifying the relative importance of predictive variables
A fundamental goal of scientific research is to identify the underlying variables that govern crucial processes of a system. This is especially difficult in ecology, which is intrinsically rich in candidate predictors. An efficient statistical procedure to evaluate the relative importance of predictors in regression models is highly desirable. However, previous studies criticised the most universally applicable method, by pointing out the low discriminating power of the importance index in simulated datasets. Here we proposed a new index, WiBB, which integrates the merits of several existing methods. WiBB combines a model‐weighting method from information theory (Wi), a standardised regression coefficient method measured by β* (B), and bootstrap resampling technique (B). We applied the WiBB in simulated datasets with known correlation structures, for both linear models (LM) and generalized linear models (GLM), to evaluate its performance. We also applied it to an empirical dataset of a plant genus Mimulus to select bioclimatic predictors of species' presence across the landscape. Results in the simulated datasets showed that the bootstrap resampling technique significantly improved the discriminant ability by correctly sorting the orders of relative importance of predictors. The WiBB method outperformed the β* and the relative sum of weights (SWi, a standardised version of sum of weights) methods in scenarios with small and large sample sizes, respectively. When testing WiBB in the empirical dataset with GLM, it sensibly identified four important predictors with high credibility out of six candidates in modelling geographical distributions of 71 Mimulus species. This integrated index has great advantages in evaluating predictor importance and hence reducing the dimensionality of data, without losing interpretive power. The simplicity of calculation of the new metric over more sophisticated statistical procedures makes it a handy method in the statistical toolbox.
Improving niche projections of plant species under climate change: Silene acaulis on the British Isles as a case study
Empirical works to assist in choosing climatically relevant variables in the attempt to predict climate change impacts on plant species are limited. Further uncertainties arise in choice of an appropriate niche model. In this study we devised and tested a sharp methodological framework, based on stringent variable ranking and filtering and flexible model selection, to minimize uncertainty in both niche modelling and successive projection of plant species distributions. We used our approach to develop an accurate, parsimonious model of Silene acaulis (L.) presence/absence on the British Isles and to project its presence/absence under climate change. The approach suggests the importance of (a) defining a reduced set of climate variables, actually relevant to species presence/absence, from an extensive list of climate predictors, and (b) considering climate extremes instead of, or together with, climate averages in projections of plant species presence/absence under future climate scenarios. Our methodological approach reduced the number of relevant climate predictors by 95.23% (from 84 to only 4), while simultaneously achieving high cross-validated accuracy (97.84%) confirming enhanced model performance. Projections produced under different climate scenarios suggest that S. acaulis will likely face climate-driven fast decline in suitable areas on the British Isles, and that upward and northward shifts to occupy new climatically suitable areas are improbable in the future. Our results also imply that conservation measures for S. acaulis based upon assisted colonization are unlikely to succeed on the British Isles due to the absence of climatically suitable habitat, so different conservation actions (seed banks and/or botanical gardens) are needed.
A Fault Isolation Method via Classification and Regression Tree-Based Variable Ranking for Drum-Type Steam Boiler in Thermal Power Plant
Accurate detection and isolation of possible faults are indispensable for operating complex industrial processes more safely, effectively, and economically. In this paper, we propose a fault isolation method for steam boilers in thermal power plants via classification and regression tree (CART)-based variable ranking. In the proposed method, binary classification trees are constructed by applying the CART algorithm to a training dataset which is composed of normal and faulty samples for classifier learning then, to perform faulty variable isolation, variable importance values for each input variable are extracted from the constructed trees. The importance values for non-faulty variables are not influenced by faulty variables, because the values are extracted from the trees with decision boundaries only in the original input space; the proposed method does not suffer from smearing effect. Furthermore, the proposed method, based on the nonparametric CART classifier, can be applicable to nonlinear processes. To confirm the effectiveness, the proposed and comparison methods are applied to two benchmark problems and 250 MW drum-type steam boiler. Experimental results show that the proposed method isolates faulty variables more clearly without the smearing effect than the comparison methods.
A Variable Ranking Method for Machine Learning Models with Correlated Features: In-Silico Validation and Application for Diabetes Prediction
When building a predictive model for predicting a clinical outcome using machine learning techniques, the model developers are often interested in ranking the features according to their predictive ability. A commonly used approach to obtain a robust variable ranking is to apply recursive feature elimination (RFE) on multiple resamplings of the training set and then to aggregate the ranking results using the Borda count method. However, the presence of highly correlated features in the training set can deteriorate the ranking performance. In this work, we propose a variant of the method based on RFE and Borda count that takes into account the correlation between variables during the ranking procedure in order to improve the ranking performance in the presence of highly correlated features. The proposed algorithm is tested on simulated datasets in which the true variable importance is known and compared to the standard RFE-Borda count method. According to the root mean square error between the estimated rank and the true (i.e., simulated) feature importance, the proposed algorithm overcomes the standard RFE-Borda count method. Finally, the proposed algorithm is applied to a case study related to the development of a predictive model of type 2 diabetes onset.
Influence of mountain pine beetle outbreaks on large fires in British Columbia
A key uncertainty in understanding climate change effects on wildfires in western North America is the role of mountain pine beetle (MPB) outbreaks in driving wildfire occurrence and severity. In this study, we investigated the complex relationship between MPB outbreaks, other environmental factors, and wildfire occurrence in British Columbia (BC), Canada. We adopted a fire risk analysis method developed for fire occurrence prediction to separate the effect of changing fuel conditions on wildfires in BC when neither post‐outbreak fuel conditions, climate, nor management is stationary. Using lasso‐logistic regression and a novel variable ranking procedure, we determined that MPB‐affected areas had 1.7 times more large lightning‐caused fires (≥100 ha), as the likelihood of large lightning‐caused fires increased by 40% in these areas and likely contributed to the increased burned areas in BC. Meanwhile, the likelihood of large human‐caused fires decreased in MPB‐affected areas. Fire weather factors were most influential for both lightning‐ and human‐caused fires, while anthropogenic factors were most influential for human‐caused fires. Fuel dynamics following MPB outbreaks vary across the wide distribution of a host species such as lodgepole pine, at stand and landscape levels. Furthermore, the expression of the effects of MPB and other disturbances on wildfire is also conditional on, as well as confounded with, many other environmental factors and management activities that vary across western North America. Therefore, a lack of consensus on the impacts of MPB on wildfire is not surprising.
A PCA-based variable ranking and selection approach for electric energy load forecasting
Purpose This paper aims to propose an approach based upon the principal component analysis (PCA) to define a contribution rate for each variable and then select the main variables as inputs to a neural network for energy load forecasting in the region southeastern Brazil. Design/methodology/approach The proposed approach defines a contribution rate of each variable as a weighted sum of the inner product between the variable and each principal component. So, the contribution rate is used for selecting the most important features of 27 variables and 6,815 electricity data for a multilayer perceptron network backpropagation prediction model. Several tests, starting from the most significant variable as input, and adding the next most significant variable and so on, are accomplished to predict energy load (GWh). The Kaiser–Meyer–Olkin and Bartlett sphericity tests were used to verify the overall consistency of the data for factor analysis. Findings Although energy load forecasting is an area for which databases with tens or hundreds of variables are available, the approach could select only six variables that contribute more than 85% for the model. While the contribution rates of the variables of the plants, plus energy exchange added, have only 14.14% of contribution, the variable the stored energy has a contribution rate of 26.31% being fundamental for the prediction accuracy. Originality/value Besides improving the forecasting accuracy and providing a faster predictor, the proposed PCA-based approach for calculating the contribution rate of input variables providing a better understanding of the underlying process that generated the data, which is fundamental to the Brazilian reality due to the accentuated climatic and economic variations.
PBoostGA: pseudo-boosting genetic algorithm for variable ranking and selection
Variable selection has consistently been a hot topic in linear regression models, especially when facing with high-dimensional data. Variable ranking, an advanced form of selection, is actually more fundamental since selection can be realized by thresholding once the variables are ranked suitably. In recent years, ensemble learning has gained a significant interest in the context of variable selection due to its great potential to improve selection accuracy and to reduce the risk of falsely including some unimportant variables. Motivated by the widespread success of boosting algorithms, a novel ensemble method PBoostGA is developed in this paper to implement variable ranking and selection in linear regression models. In PBoostGA, a weight distribution is maintained over the training set and genetic algorithm is adopted as its base learner. Initially, equal weight is assigned to each instance. According to the weight updating and ensemble member generating mechanism like AdaBoost.RT, a series of slightly different importance measures are sequentially produced for each variable. Finally, the candidate variables are ordered in the light of the average importance measure and some significant variables are then selected by a thresholding rule. Both simulation results and a real data illustration show the effectiveness of PBoostGA in comparison with some existing counterparts. In particular, PBoostGA has stronger ability to exclude redundant variables.
Assessing variable importance in clustering: a new method based on unsupervised binary decision trees
We consider different approaches for assessing variable importance in clustering. We focus on clustering using binary decision trees (CUBT), which is a non-parametric top-down hierarchical clustering method designed for both continuous and nominal data. We suggest a measure of variable importance for this method similar to the one used in Breiman’s classification and regression trees. This score is useful to rank the variables in a dataset, to determine which variables are the most important or to detect the irrelevant ones. We analyze both stability and efficiency of this score on different data simulation models in the presence of noise, and compare it to other classical variable importance measures. Our experiments show that variable importance based on CUBT is much more efficient than other approaches in a large variety of situations.
A Study of Some Important Issues for a Muslim in the Month of Ramadan
The month of Ramadan is associated with many worships that might exhaust the weak body. The most important worships in Ramadan are fasting and prayer. Some prayers require the length of sitting and standing, in addition to prayer movements. Many Muslims are spending a long time in the mosque; sitting on the ground for reading or standing for prayer. The questionnaire is focused on those who spend a long time in the mosque. Seven attributes were considered for the understanding of what affects the worshiper’s health, especially those who with chronic diseases. We found that there is a strong relationship between age and chronic diseases that may increase with the age. Also, the elderly cannot continue to pray for more than half an hour, and this is if balanced standing and sitting on a chair when is needed to prevent the disorder of the body and the emergence of symptoms of chronic disease.