Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
2,391
result(s) for
"high dimensional data"
Sort by:
Shrinkage Estimation Strategies in Generalised Ridge Regression Models
by
Arashi, Mohammad
,
Yüzbaşı, Bahadır
,
Ahmed, S. Ejaz
in
Computer simulation
,
Data analysis
,
Estimators
2020
In this study, we suggest pretest and shrinkage methods based on the generalised ridge regression estimation that is suitable for both multicollinear and high-dimensional problems. We review and develop theoretical results for some of the shrinkage estimators. The relative performance of the shrinkage estimators to some penalty methods is compared and assessed by both simulation and real-data analysis. We show that the suggested methods can be accounted as good competitors to regularisation techniques, by means of a mean squared error of estimation and prediction error. A thorough comparison of pretest and shrinkage estimators based on the maximum likelihood method to the penalty methods. In this paper, we extend the comparison outlined in his work using the least squares method for the generalised ridge regression.
Journal Article
Health Monitoring of Large-Scale Civil Structures: An Approach Based on Data Partitioning and Classical Multidimensional Scaling
by
Sarmadi, Hassan
,
Behkamal, Behshid
,
Mariani, Stefano
in
classical multidimensional scaling
,
data-driven method
,
high-dimensional data
2021
A major challenge in structural health monitoring (SHM) is the efficient handling of big data, namely of high-dimensional datasets, when damage detection under environmental variability is being assessed. To address this issue, a novel data-driven approach to early damage detection is proposed here. The approach is based on an efficient partitioning of the dataset, gathering the sensor recordings, and on classical multidimensional scaling (CMDS). The partitioning procedure aims at moving towards a low-dimensional feature space; the CMDS algorithm is instead exploited to set the coordinates in the mentioned low-dimensional space, and define damage indices through norms of the said coordinates. The proposed approach is shown to efficiently and robustly address the challenges linked to high-dimensional datasets and environmental variability. Results related to two large-scale test cases are reported: the ASCE structure, and the Z24 bridge. A high sensitivity to damage and a limited (if any) number of false alarms and false detections are reported, testifying the efficacy of the proposed data-driven approach.
Journal Article
Multivariate sparse group lasso for the multivariate multiple linear regression with an arbitrary group structure
2015
We propose a multivariate sparse group lasso variable selection and estimation method for data with highdimensional predictors as well as high-dimensional response variables. The method is carried out through a penalized multivariate multiple linear regression model with an arbitrary group structure for the regression coefficient matrix. It suits many biology studies well in detecting associations between multiple traits and multiple predictors, with each trait and each predictor embedded in some biological functional groups such as genes, pathways or brain regions. The method is able to effectively remove unimportant groups as well as unimportant individual coefficients within important groups, particularly for large p small n problems, and is flexible in handling various complex group structures such as overlapping or nested or multilevel hierarchical structures. The method is evaluated through extensive simulations with comparisons to the conventional lasso and group lasso methods, and is applied to an eQTL association study.
Journal Article
Bagging and deep learning in optimal individualized treatment rules
by
Zou, Fei
,
Mi, Xinlei
,
Zhu, Ruoqing
in
Artificial neural networks
,
BIOMETRIC PRACTICE
,
biometry
2019
An ENsemble Deep Learning Optimal Treatment (EndLot) approach is proposed for personalized medicine problems. The statistical framework of the proposed method is based on the outcome weighted learning (OWL) framework which transforms the optimal decision rule problem into a weighted classification problem. We further employ an ensemble of deep neural networks (DNNs) to learn the optimal decision rule. Utilizing the flexibility of DNNs and the stability of bootstrap aggregation, the proposed method achieves a considerable improvement over existing methods. An R package \"ITRlearn\" is developed to implement the proposed method. Numerical performance is demonstrated via simulation studies and a real data analysis of the Cancer Cell Line Encyclopedia data.
Journal Article
Greedy Outcome Weighted Tree Learning of Optimal Personalized Treatment Rules
by
Zhao, Hongyu
,
Chen, Guanhua
,
Zhao, Ying-Qi
in
Algorithms
,
artificial intelligence
,
Aversion learning
2017
We propose a subgroup identification approach for inferring optimal and interpretable personalized treatment rules with high-dimensional covariates. Our approach is based on a two-step greedy tree algorithm to pursue signals in a highdimensional space. In the first step, we transform the treatment selection problem into a weighted classification problem that can utilize tree-based methods. In the second step, we adopt a newly proposed tree-based method, known as reinforcement learning trees, to detect features involved in the optimal treatment rules and to construct binary splitting rules. The method is further extended to right censored survival data by using the accelerated failure time model and introducing double weighting to the classification trees. The performance of the proposed method is demonstrated via simulation studies, as well as analyses of the Cancer Cell Line Encyclopedia (CCLE) data and the Tamoxifen breast cancer data.
Journal Article
Doubly Robust Matching Estimators for High Dimensional Confounding Adjustment
2018
Valid estimation of treatment effects from observational data requires proper control of confounding. If the number of covariates is large relative to the number of observations, then controlling for all available covariates is infeasible. In cases where a sparsity condition holds, variable selection or penalization can reduce the dimension of the covariate space in a manner that allows for valid estimation of treatment effects. In this article, we propose matching on both the estimated propensity score and the estimated prognostic scores when the number of covariates is large relative to the number of observations. We derive asymptotic results for the matching estimator and show that it is doubly robust in the sense that only one of the two score models need be correct to obtain a consistent estimator. We show via simulation its effectiveness in controlling for confounding and highlight its potential to address nonlinear confounding. Finally, we apply the proposed procedure to analyze the effect of gender on prescription opioid use using insurance claims data.
Journal Article
Statistical Inference, Learning and Models in Big Data
by
Selvitella, Alessandro
,
Gil, Einat
,
Hendricks, Dieter
in
aggregation
,
Big Data
,
computational complexity
2016
The need for new methods to deal with big data is a common theme in most scientific fields, although its definition tends to vary with the context. Statistical ideas are an essential part of this, and as a partial response, a thematic program on statistical inference, learning and models in big data was held in 2015 in Canada, under the general direction of the Canadian Statistical Sciences Institute, with major funding from, and most activities located at, the Fields Institute for Research in Mathematical Sciences. This paper gives an overview of the topics covered, describing challenges and strategies that seem common to many different areas of application and including some examples of applications to make these challenges and strategies more concrete.
Journal Article
Likelihood Ratio Tests for High-Dimensional Normal Distributions
by
Jiang, Tiefeng
,
Qi, Yongcheng
in
central limit theorem
,
covariance matrix
,
high-dimensional data
2015
In their recent work, Jiang and Yang studied six classical Likelihood Ratio Test statistics under high-dimensional setting. Assuming that a random sample of size n is observed from a p-dimensional normal population, they derive the central limit theorems (CLTs) when p and n are proportional to each other, which are different from the classical chi-square limits as n goes to infinity, while p remains fixed. In this paper, by developing a new tool, we prove that the mentioned six CLTs hold in a more applicable setting: p goes to infinity, and p can be very close to n. This is an almost sufficient and necessary condition for the CLTs. Simulations of histograms, comparisons on sizes and powers with those in the classical chi-square approximations and discussions are presented afterwards.
Journal Article
Distance‐Based Unsupervised Local Outlier Detection: Based Values Analysis to Improve Outlier Detection Using Machine Learning
2025
Machine learning faces challenges in detecting outliers, especially in high‐dimensional datasets. Effective data quality is crucial for better results, and many algorithms identify outliers by analysing outlying aspects of data objects and objects within the dataset. The proposed Advanced Distance‐Based Unsupervised Local Outlier Detection (DU‐LOD) method improves this process by continuously evaluating and identifying outliers using unsupervised learning and distance‐based calculations. DU‐LOD identifies outliers by comparing differences between data objects and their neighbours, making it the first method to combine unsupervised local outlier detection with nearest cluster point identification. Experimental analysis through accuracy performance of 96.12%, detection rate performance of 41.89%, precision of 56.12%, and recall of 1.79% proves that our model performs best over the various parameters compared with other existing algorithms. Therefore, measures such as area under the ROC curve (AUC), precision and recall are more appropriate in such a scenario. DU‐LOD identifies outliers by comparing differences between data objects and their neighbours, making it the first method to combine unsupervised local outlier detection with nearest cluster point identification. Experimental analysis illustrates through performance measures that our model performs as the best one to date based on the various parameters compared with other existing algorithms.
Journal Article
Fast Bayesian inference in large Gaussian graphical models
2019
Despite major methodological developments, Bayesian inference in Gaussian graphical models remains challenging in high dimension due to the tremendous size of the model space. This article proposes a method to infer the marginal and conditional independence structures between variables by multiple testing, which bypasses the exploration of the model space. Specifically, we introduce closed-form Bayes factors under the Gaussian conjugate model to evaluate the null hypotheses of marginal and conditional independence between variables. Their computation for all pairs of variables is shown to be extremely efficient, thereby allowing us to address large problems with thousands of nodes as required by modern applications. Moreover, we derive exact tail probabilities from the null distributions of the Bayes factors. These allow the use of any multiplicity correction procedure to control error rates for incorrect edge inclusion. We demonstrate the proposed approach on various simulated examples as well as on a large gene expression data set from The Cancer Genome Atlas.
Journal Article