Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
      More Filters
      Clear All
      More Filters
      Source
    • Language
4,109 result(s) for "variable screening"
Sort by:
Partition-based ultrahigh-dimensional variable screening
Traditional variable selection methods are compromised by overlooking useful information on covariates with similar functionality or spatial proximity, and by treating each covariate independently. Leveraging prior grouping information on covariates, we propose partition-based screening methods for ultrahigh-dimensional variables in the framework of generalized linear models. We show that partition-based screening exhibits the sure screening property with a vanishing false selection rate, and we propose a data-driven partition screening framework with unavailable or unreliable prior knowledge on covariate grouping and investigate its theoretical properties. We consider two special cases: correlation-guided partitioning and spatial location-guided partitioning. In the absence of a single partition, we propose a theoretically justified strategy for combining statistics from various partitioning methods. The utility of the proposed methods is demonstrated via simulation and analysis of functional neuroimaging data.
Rapid non-destructive identification of selenium-enriched millet based on hyperspectral imaging technology
To meet rapid and non-destructive identification of selenium-enriched agricultural products selenium-enriched millet and ordinary millet were taken as objects. Image regions of interest (ROI) were selected to extract the spectral average value based on hyperspectral imaging technology. Reducing noise by the Savitzky-Golay (SG) smoothing algorithm, variables were used as inputs that were screened by successive projections algorithm (SPA), competitive adaptive reweighted sampling (CARS), uninformative variable elimination (UVE), CARS-SPA, UVE-SPA, and UVE-CARS, while sample variables were used as outputs to build support vector machine (SVM) models. The results showed that the accuracy of CARS-SPA-SVM was 100% in the training set and 99.58% in the test set equivalent to that of CARS-SVM and UVE-CARS-SVM, which was higher than that of SPA-SVM, UVE-SPA-SVM, and UVE-SVM. Therefore, the method of CARS-SPA had superiority, and CARS-SPA-SVM was suitable to identify selenium-enriched millet. Finally, 454.57 nm, 484.98 nm, 885.34 nm, and 937.1 nm, which were obtained by wavelength extraction algorithms, were considered as the sensitive wavelengths of selenium information. This study provided a reference for the identification of selenium-enriched agricultural products.
THE FUSED KOLMOGOROV FILTER: A NONPARAMETRIC MODEL-FREE SCREENING METHOD
A new model-free screening method called the fused Kolmogorov filter is proposed for high-dimensional data analysis. This new method is fully nonparametric and can work with many types of covariates and response variables, including continuous, discrete and categorical variables. We apply the fused Kolmogorov filter to deal with variable screening problems emerging from a wide range of applications, such as multiclass classification, nonparametric regression and Poisson regression, among others. It is shown that the fused Kolmogorov filter enjoys the sure screening property under weak regularity conditions that are much milder than those required for many existing nonparametric screening methods. In particular, the fused Kolmogorov filter can still be powerful when covariates are strongly dependent on each other. We further demonstrate the superior performance of the fused Kolmogorov filter over existing screening methods by simulations and real data examples.
A Generic Sure Independence Screening Procedure
Extracting important features from ultra-high dimensional data is one of the primary tasks in statistical learning, information theory, precision medicine, and biological discovery. Many of the sure independent screening methods developed to meet these needs are suitable for special models under some assumptions. With the availability of more data types and possible models, a model-free generic screening procedure with fewer and less restrictive assumptions is desirable. In this article, we propose a generic nonparametric sure independence screening procedure, called BCor-SIS, on the basis of a recently developed universal dependence measure: Ball correlation. We show that the proposed procedure has strong screening consistency even when the dimensionality is an exponential order of the sample size without imposing sub-exponential moment assumptions on the data. We investigate the flexibility of this procedure by considering three commonly encountered challenging settings in biological discovery or precision medicine: iterative BCor-SIS, interaction pursuit, and survival outcomes. We use simulation studies and real data analyses to illustrate the versatility and practicability of our BCor-SIS method. Supplementary materials for this article are available online.
Estimating red soil moisture using optimized spectral indices and machine learning
【Objective】Efficient monitoring of soil moisture at large scales is required for optimizing water resource management and smart agriculture, particularly in red soils where water retention is low and irrigation efficiency is limited. This paper investigates the feasibility of using multispectral remote sensing to indirectly measure the moisture content in red soils. 【Method】The study area was in Yunnan province. Using unmanned aerial vehicle (UAV) multispectral images (green, red, red-edge, and near-infrared bands) and moisture data measured in the field, we selected 22 classical and improved spectral indices to construct an inversion model. Sensitive indices were screened using three algorithms: the Pearson correlation coefficient (Pccs), variable importance in projection (VIP) and grey relational analysis (GRA). Four machine learning models: random forest (RF), back propagation neural network (BPNN), support vector regression (SVR), and light gradient boosting machine (Light-GBM) were used to estimate soil moisture content using the optimized indices.【Result】The VIP algorithm screened out six optimized spectral variables, which significantly improved computational efficiency. Among the four machine learning models we compared, the BPNN was the most robust and general. The combination of VIP and BPNN was the most accurate, and the statistical metrics of its comparison with measured field data were R2 = 0.72, RMSE = 3.36% and RPD = 1.90. The R2 of the RF model was 0.94 in the training set, but was reduced to 0.56 in the test set, indicating overfitting.【Conclusion】The multispectral inversion model using VIP and BPNN effectively captured the spatiotemporal distribution of red soil moisture in the study area. When combined with additional spectral bands and environmental parameters, this model can be applied in smart agriculture and ecological management.
Surrogate modeling: tricks that endured the test of time and some recent developments
Tasks such as analysis, design optimization, and uncertainty quantification can be computationally expensive. Surrogate modeling is often the tool of choice for reducing the burden associated with such data-intensive tasks. However, even after years of intensive research, surrogate modeling still involves a struggle to achieve maximum accuracy within limited resources. This work summarizes various advanced, yet often straightforward, statistical tools that help. We focus on four techniques with increasing popularity in the surrogate modeling community: (i) variable screening and dimensionality reduction in both the input and the output spaces, (ii) data sampling techniques or design of experiments, (iii) simultaneous use of multiple surrogates, and (iv) sequential sampling. We close the paper with some suggestions for future research.
Improving random forest predictions in small datasets from two-phase sampling designs
Background While random forests are one of the most successful machine learning methods, it is necessary to optimize their performance for use with datasets resulting from a two-phase sampling design with a small number of cases—a common situation in biomedical studies, which often have rare outcomes and covariates whose measurement is resource-intensive. Methods Using an immunologic marker dataset from a phase III HIV vaccine efficacy trial, we seek to optimize random forest prediction performance using combinations of variable screening, class balancing, weighting, and hyperparameter tuning. Results Our experiments show that while class balancing helps improve random forest prediction performance when variable screening is not applied, class balancing has a negative impact on performance in the presence of variable screening. The impact of the weighting similarly depends on whether variable screening is applied. Hyperparameter tuning is ineffective in situations with small sample sizes. We further show that random forests under-perform generalized linear models for some subsets of markers, and prediction performance on this dataset can be improved by stacking random forests and generalized linear models trained on different subsets of predictors, and that the extent of improvement depends critically on the dissimilarities between candidate learner predictions. Conclusion In small datasets from two-phase sampling design, variable screening and inverse sampling probability weighting are important for achieving good prediction performance of random forests. In addition, stacking random forests and simple linear models can offer improvements over random forests.
SPARSE COMPOSITE QUANTILE REGRESSION WITH ULTRAHIGH-DIMENSIONAL HETEROGENEOUS DATA
Although quantile regressions are widely employed for heterogeneous data, simultaneously selecting covariates that globally affect the response and estimating the coefficients is very challenging. We introduce a novel sparse composite quantile regression screening method for the analysis of ultrahigh-dimensional heterogeneous data. The proposed method enjoys the sure screening property, provides a consistent selection path, and yields consistent estimates of the coefficients simultaneously across a continuous range of quantile levels. An extended Bayesian information criterion is employed to select the “best” candidate from the path. Extensive simulation studies demonstrate the effectiveness of the proposed method, and an application to a gene expression data set is provided.
The Lq - NORM LEARNING FOR ULTRAHIGH-DIMENSIONAL SURVIVAL DATA
In the era of precision medicine, survival outcome data with high-throughput predictors are routinely collected. Models with an exceedingly large number of covariates are either infeasible to fit or likely to incur low predictability because of over fitting. Variable screening is crucial to identifying and removing irrelevant attributes. Although numerous screening methods have been proposed, most rely on some particular modeling assumptions. Motivated by a study on detecting gene signatures for the survival of patients with multiple myeloma, we propose a modelfree Lq -norm learning procedure, which includes the well-known Cramér–von Mises and Kolmogorov criteria as two special cases. This work provides an integrative framework for detecting predictors with various levels of impact, such as short- or long-term impacts, on censored outcome data. The framework leads naturally to a scheme that combines results from different q to reduce false negatives, an aspect often overlooked by the current literature. We show that our method possesses sure screening properties. The utility of the proposed method is confirmed using simulation studies and an analysis of the multiple myeloma study.
Variable Screening Optimization Algorithm for Mahalanobis-Taguchi System
This paper proposes a Mahalanobis-Taguchi system variable screening optimization method based on binary quantum behavior particle swarm.The main procedures and methods are as follows, Firstly, the Mahalanobis distance value is calculated by the Gram-Schmidt orthogonalization method.We build the multi-objective mixed planning model. The binary quantum behavior particle swarm optimization algorithm is used to solve the optimal combination. A new prediction system based on Mahalanobis-Taguchi metric is established, and the task of accurate discrimination is accomplished.