Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
5,772
result(s) for
"regression and deviation from regression"
Sort by:
Integrating BLUP, AMMI, and GGE Models to Explore GE Interactions for Adaptability and Stability of Winter Lentils (Lens culinaris Medik.)
2023
Lentil yield is a complicated quantitative trait; it is significantly influenced by the environment. It is crucial for improving human health and nutritional security in the country as well as for a sustainable agricultural system. The study was laid out to determine the stable genotype through the collaboration of G × E by AMMI and GGE biplot and to identify the superior genotypes using 33 parametric and non-parametric stability statistics of 10 genotypes across four different conditions. The total G × E effect was divided into two primary components by the AMMI model. For days to flowering, days to maturity, plant height, pods per plant, and hundred seed weight, IPCA1 was significant and accounted for 83%, 75%, 100%, and 62%, respectively. Both IPCA1 and IPCA2 were non-significant for yield per plant and accounted for 62% of the overall G × E interaction. An estimated set of eight stability parameters showed strong positive correlations with mean seed yield, and these measurements can be utilized to choose stable genotypes. The productivity of lentils has varied greatly in the environment, ranging from 786 kg per ha in the MYM environment to 1658 kg per ha in the ISD environment, according to the AMMI biplot. Three genotypes (G8, G7, and G2) were shown to be the most stable based on non-parametric stability scores for grain yield. G8, G7, G2, and G5 were determined as the top lentil genotypes based on grain production using numerical stability metrics such as Francis’s coefficient of variation, Shukla stability value (σi2), and Wrick’s ecovalence (Wi). Genotypes G7, G10, and G4 were the most stable with the highest yield, according to BLUP-based simultaneous selection stability characteristics. The findings of graphic stability methods such as AMMI and GGE for identifying the high-yielding and stable lentil genotypes were very similar. While the GGE biplot indicated G2, G10, and G7 as the most stable and high-producing genotypes, AMMI analysis identified G2, G9, G10, and G7. These selected genotypes would be used to release a new variety. Considering all the stability models, such as Eberhart and Russell’s regression and deviation from regression, additive main effects, multiplicative interactions (AMMI) analysis, and GGE, the genotypes G2, G9, and G7 could be used as well-adapted genotypes with moderate grain yield in all tested environments.
Journal Article
An aggregate and iterative disaggregate algorithm with proven optimality in machine learning
2016
We propose a clustering-based iterative algorithm to solve certain optimization problems in machine learning, where we start the algorithm by aggregating the original data, solving the problem on aggregated data, and then in subsequent steps gradually disaggregate the aggregated data. We apply the algorithm to common machine learning problems such as the least absolute deviation regression problem, support vector machines, and semi-supervised support vector machines. We derive model-specific data aggregation and disaggregation procedures. We also show optimality, convergence, and the optimality gap of the approximated solution in each iteration. A computational study is provided.
Journal Article
Autoregressive-based outlier algorithm to detect money laundering activities
2017
Purpose
Due to the large-size, non-uniform transactions per day, the money laundering detection (MLD) is a time-consuming and difficult process. The major purpose of the proposed auto-regressive (AR) outlier-based MLD (AROMLD) is to reduce the time consumption for handling large-sized non-uniform transactions.
Design/methodology/approach
The AR-based outlier design produces consistent asymptotic distributed results that enhance the demand-forecasting abilities. Besides, the inter-quartile range (IQR) formulations proposed in this paper support the detailed analysis of time-series data pairs.
Findings
The prediction of high-dimensionality and the difficulties in the relationship/difference between the data pairs makes the time-series mining as a complex task. The presence of domain invariance in time-series mining initiates the regressive formulation for outlier detection. The deep analysis of time-varying process and the demand of forecasting combine the AR and the IQR formulations for an effective outlier detection.
Research limitations/implications
The present research focuses on the detection of an outlier in the previous financial transaction, by using the AR model. Prediction of the possibility of an outlier in future transactions remains a major issue.
Originality/value
The lack of prior segmentation of ML detection suffers from dimensionality. Besides, the absence of boundary to isolate the normal and suspicious transactions induces the limitations. The lack of deep analysis and the time consumption are overwhelmed by using the regression formulation.
Journal Article
BEST SUBSET SELECTION VIA A MODERN OPTIMIZATION LENS
2016
In the period 1991-2015, algorithmic advances in Mixed Integer Optimization (MIO) coupled with hardware improvements have resulted in an astonishing 450 billion factor speedup in solving MIO problems. We present a MIO approach for solving the classical best subset selection problem of choosing k out of p features in linear regression given n observations. We develop a discrete extension of modern first-order continuous optimization methods to find high quality feasible solutions that we use as warm starts to a MIO solver that finds provably optimal solutions. The resulting algorithm (a) provides a solution with a guarantee on its suboptimality even if we terminate the algorithm early, (b) can accommodate side constraints on the coefficients of the linear regression and (c) extends to finding best subset solutions for the least absolute deviation loss function. Using a wide variety of synthetic and real datasets, we demonstrate that our approach solves problems with n in the 1000s and p in the 100s in minutes to provable optimality, and finds near optimal solutions for n in the 100s and p in the 1000s in minutes. We also establish via numerical experiments that the MIO approach performs better than Lasso and other popularly used sparse learning procedures, in terms of achieving sparse solutions with good predictive power.
Journal Article
Estimation of irrigation water quality index with development of an optimum model: a case study
2020
Surface water quality parameters are important means for determination of water’s suitability for irrigation. In this research, data from 32 irrigation stations were used to calculate the sodium adsorption rate (SAR), sodium percentage (Na%), Kelly index (KI), permeability index (PI) and irrigation water quality index (IWQI) for evaluation of surface water quality. The obtained SAR, KI and Na% values, respectively, varied between 0.10 and 9.43, 0.03–1.37 meq/l and 3.16–57.82%. The calculated PI values indicate that, 93.75% of the water samples is in “suitable” category, and 6.25% is in “non-suitable” category. The IWQI values obtained from the research area varied between 30.59 and 81.09. In terms of irrigation water quality, 12.5% of the samples is of “good” quality, 15.62% is of “poor” quality, 68.75% is of “very poor” quality, and 3.12% is of “non-suitable” quality. Accordingly, IWQI value was estimated on the basis of SAR, Na%, KI and PI values using multiple regression and artificial neural network (ANN) model. The regression coefficient (R2) was determined as 0.6 in multiple regression analysis, and a moderately significant relationship (p < 0.05) was detected. As the calculated F value was higher than the tabulated F value, a real relationship between the dependent and independent variables is inferred. Four different models were built with ANN, and the statistical performance of the models was determined using statistical parameters such as average value (µ), standard error (SE), standard deviation (σ), R2, root mean square error (RMSE) and mean absolute percentage error (MAPE). The training R2 value belonging to the best model was found to be significantly high (0.99). The relation between the estimation results of ANN model and the experimental data (R2 = 0.92) verifies the model’s success. As a result, ANN proved to be a successful means for IWQI estimation using different water quality parameters.
Journal Article
Model averaging and muddled multimodel inferences
2015
Three flawed practices associated with model averaging coefficients for predictor variables in regression models commonly occur when making multimodel inferences in analyses of ecological data. Model-averaged regression coefficients based on Akaike information criterion (AIC) weights have been recommended for addressing model uncertainty but they are not valid, interpretable estimates of partial effects for individual predictors when there is multicollinearity among the predictor variables. Multicollinearity implies that the scaling of units in the denominators of the regression coefficients may change across models such that neither the parameters nor their estimates have common scales, therefore averaging them makes no sense. The associated sums of AIC model weights recommended to assess relative importance of individual predictors are really a measure of relative importance of models, with little information about contributions by individual predictors compared to other measures of relative importance based on effects size or variance reduction. Sometimes the model-averaged regression coefficients for predictor variables are incorrectly used to make model-averaged predictions of the response variable when the models are not linear in the parameters. I demonstrate the issues with the first two practices using the college grade point average example extensively analyzed by Burnham and Anderson. I show how partial standard deviations of the predictor variables can be used to detect changing scales of their estimates with multicollinearity. Standardizing estimates based on partial standard deviations for their variables can be used to make the scaling of the estimates commensurate across models, a necessary but not sufficient condition for model averaging of the estimates to be sensible. A unimodal distribution of estimates and valid interpretation of individual parameters are additional requisite conditions. The standardized estimates or equivalently the
t
statistics on unstandardized estimates also can be used to provide more informative measures of relative importance than sums of AIC weights. Finally, I illustrate how seriously compromised statistical interpretations and predictions can be for all three of these flawed practices by critiquing their use in a recent species distribution modeling technique developed for predicting Greater Sage-Grouse (
Centrocercus urophasianus
) distribution in Colorado, USA. These model averaging issues are common in other ecological literature and ought to be discontinued if we are to make effective scientific contributions to ecological knowledge and conservation of natural resources.
Journal Article
Local composite quantile regression smoothing: an efficient and safe alternative to local polynomial regression
by
Zou, Hui
,
Kai, Bo
,
Li, Runze
in
Asymptotic efficiency
,
Bias
,
Composite quantile regression estimator
2010
Local polynomial regression is a useful non-parametric regression tool to explore fine data structures and has been widely used in practice. We propose a new non-parametric regression technique called local composite quantile regression smoothing to improve local polynomial regression further. Sampling properties of the estimation procedure proposed are studied. We derive the asymptotic bias, variance and normality of the estimate proposed. The asymptotic relative efficiency of the estimate with respect to local polynomial regression is investigated. It is shown that the estimate can be much more efficient than the local polynomial regression estimate for various non-normal errors, while being almost as efficient as the local polynomial regression estimate for normal errors. Simulation is conducted to examine the performance of the estimates proposed. The simulation results are consistent with our theoretical findings. A real data example is used to illustrate the method proposed.
Journal Article
Standards for Standardized Logistic Regression Coefficients
2011
Standardized coefficients in logistic regression analysis have the same utility as standardized coefficients in linear regression analysis. Although there has been no consensus on the best way to construct standardized logistic regression coefficients, there is now sufficient evidence to suggest a single best approach to the construction of a standardized logistic regression coefficient 1 that can be used in the same way across a broad range of problems as the standardized linear regression coefficient and also to suggest the adequacy of other approaches for limited purposes. This article reviews the state of knowledge regarding the use of standardized coefficients in general and standardized logistic regression coefficients in particular, and makes specific recommendations on how to best use (and avoid abusing) standardized logistic regression coefficients.
Journal Article
Nonparametric estimation of conditional medians for linear and related processes
2010
We consider nonparametric estimation of conditional medians for time series data. The time series data are generated from two mutually independent linear processes. The linear processes may show long-range dependence. The estimator of the conditional medians is based on minimizing the locally weighted sum of absolute deviations for local linear regression. We present the asymptotic distribution of the estimator. The rate of convergence is independent of regressors in our setting. The result of a simulation study is also given.
Journal Article
Estimation of high dimensional mean regression in the absence of symmetry and light tail assumptions
2017
Data subject to heavy-tailed errors are commonly encountered in various scientific fields. To address this problem, procedures based on quantile regression and least absolute deviation regression have been developed in recent years. These methods essentially estimate the conditional median (or quantile) function. They can be very different from the conditional mean functions, especially when distributions are asymmetric and heteroscedastic. How can we efficiently estimate the mean regression functions in ultrahigh dimensional settings with existence of only the second moment? To solve this problem, we propose a penalized Huber loss with diverging parameter to reduce biases created by the traditional Huber loss. Such a penalized robust approximate (RA) quadratic loss will be called the RA lasso. In the ultrahigh dimensional setting, where the dimensionality can grow exponentially with the sample size, our results reveal that the RA lasso estimator produces a consistent estimator at the same rate as the optimal rate under the light tail situation. We further study the computational convergence of the RA lasso and show that the composite gradient descent algorithm indeed produces a solution that admits the same optimal rate after sufficient iterations. As a by-product, we also establish the concentration inequality for estimating the population mean when there is only the second moment. We compare the RA lasso with other regularized robust estimators based on quantile regression and least absolute deviation regression. Extensive simulation studies demonstrate the satisfactory finite sample performance of the RA lasso.
Journal Article