Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
67
result(s) for
"R statistical computing"
Sort by:
Random forest as a generic framework for predictive modeling of spatial and spatio-temporal variables
by
Heuvelink, Gerard B.M.
,
Gräler, Benedikt
,
Nussbaum, Madlene
in
Algorithms
,
Artificial intelligence
,
Biogeography
2018
Random forest and similar Machine Learning techniques are already used to generate spatial predictions, but spatial location of points (geography) is often ignored in the modeling process. Spatial auto-correlation, especially if still existent in the cross-validation residuals, indicates that the predictions are maybe biased, and this is suboptimal. This paper presents a random forest for spatial predictions framework (RFsp) where buffer distances from observation points are used as explanatory variables, thus incorporating geographical proximity effects into the prediction process. The RFsp framework is illustrated with examples that use textbook datasets and apply spatial and spatio-temporal prediction to numeric, binary, categorical, multivariate and spatiotemporal variables. Performance of the RFsp framework is compared with the state-of-the-art kriging techniques using fivefold cross-validation with refitting. The results show that RFsp can obtain equally accurate and unbiased predictions as different versions of kriging. Advantages of using RFsp over kriging are that it needs no rigid statistical assumptions about the distribution and stationarity of the target variable, it is more flexible towards incorporating, combining and extending covariates of different types, and it possibly yields more informative maps characterizing the prediction error. RFsp appears to be especially attractive for building multivariate spatial prediction models that can be used as “knowledge engines” in various geoscience fields. Some disadvantages of RFsp are the exponentially growing computational intensity with increase of calibration data and covariates and the high sensitivity of predictions to input data quality. The key to the success of the RFsp framework might be the training data quality—especially quality of spatial sampling (to minimize extrapolation problems and any type of bias in data), and quality of model validation (to ensure that accuracy is not effected by overfitting). For many data sets, especially those with lower number of points and covariates and close-to-linear relationships, model-based geostatistics can still lead to more accurate predictions than RFsp.
Journal Article
Navigating through the r packages for movement
2020
The advent of miniaturized biologging devices has provided ecologists with unprecedented opportunities to record animal movement across scales, and led to the collection of ever‐increasing quantities of tracking data. In parallel, sophisticated tools have been developed to process, visualize and analyse tracking data; however, many of these tools have proliferated in isolation, making it challenging for users to select the most appropriate method for the question in hand. Indeed, within the r software alone, we listed 58 packages created to deal with tracking data or ‘tracking packages’. Here, we reviewed and described each tracking package based on a workflow centred around tracking data (i.e. spatio‐temporal locations (x, y, t)), broken down into three stages: pre‐processing, post‐processing and analysis, the latter consisting of data visualization, track description, path reconstruction, behavioural pattern identification, space use characterization, trajectory simulation and others. Supporting documentation is key to render a package accessible for users. Based on a user survey, we reviewed the quality of packages' documentation and identified 11 packages with good or excellent documentation. Links between packages were assessed through a network graph analysis. Although a large group of packages showed some degree of connectivity (either depending on functions or suggesting the use of another tracking package), one third of the packages worked in isolation, reflecting a fragmentation in the r movement‐ecology programming community. Finally, we provide recommendations for users when choosing packages, and for developers to maximize the usefulness of their contribution and strengthen the links within the programming community. The increased use of biologging devices has propelled the development of methods and software tools for analyzing tracking data. This work reviews 58 r packages for movement, acts as a road map for movement ecologists and offers recommendations for package developers from a user perspective.
Journal Article
A R-Script for Generating Multiple Sclerosis Lesion Pattern Discrimination Plots
by
Sellner, Johann
,
Marschallinger, Hannes
,
Marschallinger, Robert
in
Brain
,
Central nervous system
,
Demyelinating diseases
2021
One significant characteristic of Multiple Sclerosis (MS), a chronic inflammatory demyelinating disease of the central nervous system, is the evolution of highly variable patterns of white matter lesions. Based on geostatistical metrics, the MS-Lesion Pattern Discrimination Plot reduces complex three- and four-dimensional configurations of MS-White Matter Lesions to a well-arranged and standardized two-dimensional plot that facilitates follow-up, cross-sectional and medication impact analysis. Here, we present a script that generates the MS-Lesion Pattern Discrimination Plot, using the widespread statistical computing environment R. Input data to the script are Nifti-1 or Analyze-7.5 files with individual MS-White Matter Lesion masks in Montreal Normal Brain geometry. The MS-Lesion Pattern Discrimination Plot, variogram plots and associated fitting statistics are output to the R console and exported to standard graphics and text files. Besides reviewing relevant geostatistical basics and commenting on implementation details for smooth customization and extension, the paper guides through generating MS-Lesion Pattern Discrimination Plots using publicly available synthetic MS-Lesion patterns. The paper is accompanied by the R script LDPgenerator.r, a small sample data set and associated graphics for comparison.
Journal Article
Optimal Symmetric Multimodal Templates and Concatenated Random Forests for Supervised Brain Tumor Segmentation (Simplified) with ANTsR
by
Shrinidhi, K. L.
,
Durst, Christopher R.
,
Kandel, Benjamin M.
in
Algorithms
,
Bioinformatics
,
Biomedical and Life Sciences
2015
Segmenting and quantifying gliomas from MRI is an important task for diagnosis, planning intervention, and for tracking tumor changes over time. However, this task is complicated by the lack of prior knowledge concerning tumor location, spatial extent, shape, possible displacement of normal tissue, and intensity signature. To accommodate such complications, we introduce a framework for supervised segmentation based on multiple modality intensity, geometry, and asymmetry feature sets. These features drive a supervised whole-brain and tumor segmentation approach based on random forest-derived probabilities. The asymmetry-related features (based on optimal symmetric multimodal templates) demonstrate excellent discriminative properties within this framework. We also gain performance by generating probability maps from random forest models and using these maps for a refining Markov random field regularized probabilistic segmentation. This strategy allows us to interface the supervised learning capabilities of the random forest model with regularized probabilistic segmentation using the recently developed
ANTsR
package—a comprehensive statistical and visualization interface between the popular Advanced Normalization Tools (ANTs) and the
R
statistical project. The reported algorithmic framework was the top-performing entry in the MICCAI 2013 Multimodal Brain Tumor Segmentation challenge. The challenge data were widely varying consisting of both high-grade and low-grade glioma tumor four-modality MRI from five different institutions. Average Dice overlap measures for the final algorithmic assessment were 0.87, 0.78, and 0.74 for “complete”, “core”, and “enhanced” tumor components, respectively.
Journal Article
Motivation, values, and work design as drivers of participation in the R open source project for statistical computing
by
Zeileis, Achim
,
Mair, Patrick
,
Gruber, Kathrin
in
Computer science
,
Cooperative Behavior
,
Design
2015
One of the cornerstones of the R system for statistical computing is the multitude of packages contributed by numerous package authors. This amount of packages makes an extremely broad range of statistical techniques and other quantitative methods freely available. Thus far, no empirical study has investigated psychological factors that drive authors to participate in the R project. This article presents a study of R package authors, collecting data on different types of participation (number of packages, participation in mailing lists, participation in conferences), three psychological scales (types of motivation, psychological values, and work design characteristics), and various socio-demographic factors. The data are analyzed using item response models and subsequent generalized linear models, showing that the most important determinants for participation are a hybrid form of motivation and the social characteristics of the work design. Other factors are found to have less impact or influence only specific aspects of participation.
Journal Article
R Packages for Data Quality Assessments and Data Monitoring: A Software Scoping Review with Recommendations for Future Developments
by
Struckmann, Stephan
,
Mariño, Joany
,
Kapsner, Lorenz A.
in
Automation
,
Data analysis
,
data quality
2022
Data quality assessments (DQA) are necessary to ensure valid research results. Despite the growing availability of tools of relevance for DQA in the R language, a systematic comparison of their functionalities is missing. Therefore, we review R packages related to data quality (DQ) and assess their scope against a DQ framework for observational health studies. Based on a systematic search, we screened more than 140 R packages related to DQA in the Comprehensive R Archive Network. From these, we selected packages which target at least three of the four DQ dimensions (integrity, completeness, consistency, accuracy) in a reference framework. We evaluated the resulting 27 packages for general features (e.g., usability, metadata handling, output types, descriptive statistics) and the possible assessment’s breadth. To facilitate comparisons, we applied all packages to a publicly available dataset from a cohort study. We found that the packages’ scope varies considerably regarding functionalities and usability. Only three packages follow a DQ concept, and some offer an extensive rule-based issue analysis. However, the reference framework does not include a few implemented functionalities, and it should be broadened accordingly. Improved use of metadata to empower DQA and user-friendliness enhancement, such as GUIs and reports that grade the severity of DQ issues, stand out as the main directions for future developments.
Journal Article
Innate biology versus lifestyle behaviour in the aetiology of obesity and type 2 diabetes: the GLACIER Study
2016
Aims/hypothesis
We compared the ability of genetic (established type 2 diabetes, fasting glucose, 2 h glucose and obesity variants) and modifiable lifestyle (diet, physical activity, smoking, alcohol and education) risk factors to predict incident type 2 diabetes and obesity in a population-based prospective cohort of 3,444 Swedish adults studied sequentially at baseline and 10 years later.
Methods
Multivariable logistic regression analyses were used to assess the predictive ability of genetic and lifestyle risk factors on incident obesity and type 2 diabetes by calculating the AUC.
Results
The predictive accuracy of lifestyle risk factors was similar to that yielded by genetic information for incident type 2 diabetes (AUC 75% and 74%, respectively) and obesity (AUC 68% and 73%, respectively) in models adjusted for age, age
2
and sex. The addition of genetic information to the lifestyle model significantly improved the prediction of type 2 diabetes (AUC 80%;
p
= 0.0003) and obesity (AUC 79%;
p
< 0.0001) and resulted in a net reclassification improvement of 58% for type 2 diabetes and 64% for obesity.
Conclusions/interpretation
These findings illustrate that lifestyle and genetic information separately provide a similarly high degree of long-range predictive accuracy for obesity and type 2 diabetes.
Journal Article
Prognostic Significance of Perineural Invasion in Patients with Rectal Cancer using R Environment for Statistical Computing and Graphics
by
Irimie, Alexandru
,
Cazacu, Mircea
,
Vlad, Ioan-Catalin
in
Colorectal cancer
,
Medical prognosis
,
Perineural invasion
2012
In recent studies perineural invasion (PNI) is associated with poor survival rates in rectal cancer, but the impact of PNI it's still controversial. We assessed PNI as a potential prognostic factor in rectal cancer. Patients and Methods: We analyzed 317 patients with rectal cancer resected at The Oncology Institute\"Prof. Dr. Ion Chiricuta\" Cluj-Napoca, between January 2000 and December 2008. Tumors were reviewed for PNI by a pathologist. Patients data were reviewed and entered into a comprehensive database. The statistical analysis in our study was carried out in R environment for statistical computing and graphics, version 1.15.1. Overall and disease-free survivals were determined using the Kaplan-Meier method, and multivariate analysis using the Cox multiple hazards model. Results were compared using the log-rank test. Results: In our study PNI was identified in 19% of tumors. The 5-year disease-free survival rate was higher for patients with PNI-negative tumors versus those with PNI-positive tumors (57.31% vs. 36.99%, p=0.009). The 5-year overall survival rate was 59.15% for PNI-negative tumors versus 39.19% for PNI-positive tumors (p=0.014). On multivariate analysis, PNI was an independent prognostic factor for overall survival (Hazard Ratio = 0.6; 95% CI = 0.41 to 0.87; p = 0.0082). Conclusions: PNI can be considered an independent prognostic factor of outcomes in patients with rectal cancer. PNI should be taken into account when selecting patients for adjuvant treatment. R environment for statistical computing and graphics is complex yet easy to use software that has proven to be efficient in our clinical study. [PUBLICATION ABSTRACT]
Journal Article
Introduction
by
Chicken, Eric
,
Hollander, Myles
,
A. Wolfe, Douglas
in
direct computation
,
nonparametric statistical methods
,
statistical computing package R
2015
This introduction presents an overview of the key concepts discussed in the subsequent chapters of this book. It discusses the advantages of nonparametric methods, and some real‐world applications. The chapter presents 10 examples that are a sample of the type of problems that will help the reader analyze using nonparametric methods. It then outlines the format of the book, which stresses the application of nonparametric techniques to real data. The basic data, assumptions, and procedures are described precisely in each chapter in the following format. Data and Assumptions are specified before the group of particular procedures discussed. Then, for each technique, the book includes (when applicable) the following subsections: Procedure, Large‐Sample Approximation, Ties, Example, Comments, Properties, and Problems. In many of the Example subsections, the book not only illustrates the direct computation of the procedure, but also provides the output obtained using various commands in the statistical computing package R.
Book Chapter
分位數迴歸分析在高等教育報酬率的應用
by
王書敏(Wang,Shu-Min )
,
吳政達(Wu,Cheng-Ta )
in
quantile regression
,
returns to education
,
the R project for statistical computing
2016
分位數迴歸分析有別於普通最小平方法,除了可以估計中間值外,亦可求得不同分位數之估計成果,解釋整個分配情況,也因較全面性的解釋,使得分位數迴歸分析在各領域漸漸獲得重視。本文旨在探討分位數迴歸分析於教育領域上的應用,除整理分位數迴歸之資料處理與統計分析,並以臺灣教育改革20年高等教育科系別之教育報酬率為題,舉例說明分位數迴歸分析之應用。期許未來教育領域研究上可以融入分位數迴歸分析方法,以助於針對教育議題進行更深入之探討
Journal Article