Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Series TitleSeries Title
-
Reading LevelReading Level
-
YearFrom:-To:
-
More FiltersMore FiltersContent TypeItem TypeIs Full-Text AvailableSubjectPublisherSourceDonorLanguagePlace of PublicationContributorsLocation
Done
Filters
Reset
20,033
result(s) for
"Information science Statistical methods."
Sort by:
Statistical and machine learning approaches for network analysis
by
Basak, Subhash C
,
Dehmer, Matthias
in
Communication
,
Communication - Network analysis - Graphic methods
,
Graphic methods
2012
\"This book explores novel graph classes and presents novel methods to classify networks. It particularly addresses the following problems: exploration of novel graph classes and their relationships among each other; existing and classical methods to analyze networks; novel graph similarity and graph classification techniques based on machine learning methods; and applications of graph classification and graph mining. Key topics are addressed in depth including the mathematical definition of novel graph classes, i.e. generalized trees and directed universal hierarchical graphs, and the application areas in which to apply graph classes to practical problems in computational biology, computer science, mathematics, mathematical psychology, etc\"--
Statistics for library and information services
by
Friedman, Alon
in
Information science
,
Information science -- Statistical methods
,
Information visualization
2015,2016
Statistics for Library and Information Services, written for non-statisticians, provides logical, user-friendly, and step-by-step instructions to make statistics more accessible for students and professionals in the field of Information Science.
Reality Mining
2014
Big Data is made up of lots of little data: numbers entered into cell phones, addresses entered into GPS devices, visits to websites, online purchases, ATM transactions, and any other activity that leaves a digital trail. Although the abuse of Big Data -- surveillance, spying, hacking -- has made headlines, it shouldn't overshadow the abundant positive applications of Big Data. InReality Mining, Nathan Eagle and Kate Greene cut through the hype and the headlines to explore the positive potential of Big Data, showing the ways in which the analysis of Big Data (\"Reality Mining\") can be used to improve human systems as varied as political polling and disease tracking, while considering user privacy.Eagle, a recognized expert in the field, and Greene, an experienced technology journalist, describe Reality Mining at five different levels: the individual, the neighborhood and organization, the city, the nation, and the world. For each level, they first offer a nontechnical explanation of data collection methods and then describe applications and systems that have been or could be built. These include a mobile app that helps smokers quit smoking; a workplace \"knowledge system\"; the use of GPS, Wi-Fi, and mobile phone data to manage and predict traffic flows; and the analysis of social media to track the spread of disease. Eagle and Greene argue that Big Data, used respectfully and responsibly, can help people live better, healthier, and happier lives.
MRIQC: Advancing the automatic prediction of image quality in MRI from unseen sites
2017
Quality control of MRI is essential for excluding problematic acquisitions and avoiding bias in subsequent image processing and analysis. Visual inspection is subjective and impractical for large scale datasets. Although automated quality assessments have been demonstrated on single-site datasets, it is unclear that solutions can generalize to unseen data acquired at new sites. Here, we introduce the MRI Quality Control tool (MRIQC), a tool for extracting quality measures and fitting a binary (accept/exclude) classifier. Our tool can be run both locally and as a free online service via the OpenNeuro.org portal. The classifier is trained on a publicly available, multi-site dataset (17 sites, N = 1102). We perform model selection evaluating different normalization and feature exclusion approaches aimed at maximizing across-site generalization and estimate an accuracy of 76%±13% on new sites, using leave-one-site-out cross-validation. We confirm that result on a held-out dataset (2 sites, N = 265) also obtaining a 76% accuracy. Even though the performance of the trained classifier is statistically above chance, we show that it is susceptible to site effects and unable to account for artifacts specific to new sites. MRIQC performs with high accuracy in intra-site prediction, but performance on unseen sites leaves space for improvement which might require more labeled data and new approaches to the between-site variability. Overcoming these limitations is crucial for a more objective quality assessment of neuroimaging data, and to enable the analysis of extremely large and multi-site samples.
Journal Article
Chemometric analysis in Raman spectroscopy from experimental design to machine learning–based modeling
by
Guo, Shuxia
,
Popp, Jürgen
,
Bocklitz, Thomas
in
631/114/2164
,
639/638/440/527/1821
,
639/705/1042
2021
Raman spectroscopy is increasingly being used in biology, forensics, diagnostics, pharmaceutics and food science applications. This growth is triggered not only by improvements in the computational and experimental setups but also by the development of chemometric techniques. Chemometric techniques are the analytical processes used to detect and extract information from subtle differences in Raman spectra obtained from related samples. This information could be used to find out, for example, whether a mixture of bacterial cells contains different species, or whether a mammalian cell is healthy or not. Chemometric techniques include spectral processing (ensuring that the spectra used for the subsequent computational processes are as clean as possible) as well as the statistical analysis of the data required for finding the spectral differences that are most useful for differentiation between, for example, different cell types. For Raman spectra, this analysis process is not yet standardized, and there are many confounding pitfalls. This protocol provides guidance on how to perform a Raman spectral analysis: how to avoid these pitfalls, and strategies to circumvent problematic issues. The protocol is divided into four parts: experimental design, data preprocessing, data learning and model transfer. We exemplify our workflow using three example datasets where the spectra from individual cells were collected in single-cell mode, and one dataset where the data were collected from a raster scanning–based Raman spectral imaging experiment of mice tissue. Our aim is to help move Raman-based technologies from proof-of-concept studies toward real-world applications.
Raman spectroscopy is increasingly being used in biological assays and studies. This protocol provides guidance for performing chemometric analysis to detect and extract information relating to the chemical differences between biological samples.
Journal Article
PPanGGOLiN: Depicting microbial diversity via a partitioned pangenome graph
by
Burlot, Laura
,
Planel, Rémi
,
Bazin, Adelme
in
Algorithms
,
Bacteria - classification
,
Bacteria - genetics
2020
Microorganisms have the greatest biodiversity and evolutionary history on earth. At the genomic level, it is reflected by a highly variable gene content even among organisms from the same species which explains the ability of microbes to be pathogenic or to grow in specific environments. We developed a new method called PPanGGOLiN which accurately represents the genomic diversity of a species (i.e. its pangenome) using a compact graph structure. Based on this pangenome graph, we classify genes by a statistical method according to their occurrence in the genomes. This method allowed us to build pangenomes even for uncultivated species at an unprecedented scale. We applied our method on all available genomes in databanks in order to depict the overall diversity of hundreds of species. Overall, our work enables microbiologists to explore and visualize pangenomes alike a subway map.
Journal Article
Comparison of the effects of imputation methods for missing data in predictive modelling of cohort study datasets
by
Zhang, XiangHui
,
Li, Yu
,
Guo, ShuXia
in
Algorithms
,
Cardiovascular disease
,
Cardiovascular Diseases - diagnosis
2024
Background
Missing data is frequently an inevitable issue in cohort studies and it can adversely affect the study's findings. We assess the effectiveness of eight frequently utilized statistical and machine learning (ML) imputation methods for dealing with missing data in predictive modelling of cohort study datasets. This evaluation is based on real data and predictive models for cardiovascular disease (CVD) risk.
Methods
The data is from a real-world cohort study in Xinjiang, China. It includes personal information, physical examination data, questionnaires, and laboratory biochemical results from 10,164 subjects with a total of 37 variables. Simple imputation (Simple), regression imputation (Regression), expectation-maximization(EM), multiple imputation (MICE) , K nearest neighbor classification (KNN), clustering imputation (Cluster), random forest (RF), and decision tree (Cart) were the chosen imputation methods. Root Mean Square Error (RMSE) and Mean Absolute Error (MAE) are utilised to assess the performance of different methods for missing data imputation at a missing rate of 20%. The datasets processed with different missing data imputation methods were employed to construct a CVD risk prediction model utilizing the support vector machine (SVM). The predictive performance was then compared using the area under the curve (AUC).
Results
The most effective imputation results were attained by KNN (MAE: 0.2032, RMSE: 0.7438, AUC: 0.730, CI: 0.719-0.741) and RF (MAE: 0.3944, RMSE: 1.4866, AUC: 0.777, CI: 0.769-0.785). The subsequent best performances were achieved by EM, Cart, and MICE, while Simple, Regression, and Cluster attained the worst performances. The CVD risk prediction model was constructed using the complete data (AUC:0.804, CI:0.796-0.812) in comparison with all other models with
p
<0.05.
Conclusion
KNN and RF exhibit superior performance and are more adept at imputing missing data in predictive modelling of cohort study datasets.
Journal Article