Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
3,467
result(s) for
"Multiple measurement analysis"
Sort by:
A topological data analysis based classification method for multiple measurements
2020
Background
Machine learning models for repeated measurements are limited. Using topological data analysis (TDA), we present a classifier for repeated measurements which samples from the data space and builds a network graph based on the data topology. A machine learning model with cross-validation is then applied for classification. When test this on three case studies, accuracy exceeds an alternative support vector machine (SVM) voting model in most situations tested, with additional benefits such as reporting data subsets with high purity along with feature values.
Results
For 100 examples of 3 different tree species, the model reached 80% classification accuracy after 30 datapoints, which was improved to 90% after increased sampling to 400 datapoints. The alternative SVM classifier achieved a maximum accuracy of 68.7%. Using data from 100 examples from each class of 6 different random point processes, the classifier achieved 96.8% accuracy, vastly outperforming the SVM. Using two outcomes in neuron spiking data, the TDA classifier was similarly accurate to the SVM in one case (both converged to 97.8% accuracy), but was outperformed in the other (relative accuracies 79.8% and 92.2%, respectively).
Conclusions
This algorithm and software can be beneficial for repeated measurement data common in biological sciences, as both an accurate classifier and a feature selection tool.
Journal Article
Nature in malls: Effects of a natural environment on the cognitive image, emotional response, and behaviors of visitors
2019
This study aims to assess the role of a natural environment and its effects on the following components of attitudes: cognitive image, affective response, and behavioral intentions. Using a survey conducted among 292 mall visitors, this study also examines how the perception of the atmosphere in a mall can indirectly affect behavioral intentions. The findings confirm that the components of cognitive image, namely, appealingly design features, may positively influence affective responses at malls. Affective response also positively impacts the behavioral intention of a mall visitor. Affective response features were found to be more powerful than the cognitive image and natural atmosphere attributes to affect the behavioral intentions of visitors through a multiple measurement analysis. In addition, different theoretical and practical implications are discussed.
Journal Article
A topological data analysis based classification method for multiple measurements
2019
Machine learning models for repeated measurements are limited. Using topological data analysis (TDA), we present a classifier for repeated measurements which samples from the data space and builds a network graph based on the data topology. When applying this to two case studies, accuracy exceeds alternative models with additional benefits such as reporting data subsets with high purity along with feature values.
For 300 examples of 3 tree species, the accuracy reached 80% after 30 datapoints, which was improved to 90% after increased sampling to 400 datapoints. Using data from 100 examples of each of 6 point processes, the classifier achieved 96.8% accuracy. In both datasets, the TDA classifier outperformed an alternative model.
This algorithm and software can be beneficial for repeated measurement data common in biological sciences, as both an accurate classifier and a feature selection tool.
Evaluation of Harmonic Contributions for Multi Harmonic Sources System Based on Mixed Entropy Screening and an Improved Independent Component Analysis Method
by
Zhao, Jinshuai
,
Xu, Fangwei
,
Yang, Honggeng
in
asynchronous measurement
,
complex independent component analysis
,
harmonic contribution
2020
Evaluating the harmonic contributions of each nonlinear customer is important for harmonic mitigation in a power system with diverse and complex harmonic sources. The existing evaluation methods have two shortcomings: (1) the calculation accuracy is easily affected by background harmonics fluctuation; and (2) they rely on Global Positioning System (GPS) measurements, which is not economic when widely applied. In this paper, based on the properties of asynchronous measurements, we propose a model for evaluating harmonic contributions without GPS technology. In addition, based on the Gaussianity of the measured harmonic data, a mixed entropy screening mechanism is proposed to assess the fluctuation degree of the background harmonics for each data segment. Only the segments with relatively stable background harmonics are chosen for calculation, which reduces the impacts of the background harmonics in a certain degree. Additionally, complex independent component analysis, as a potential method to this field, is improved in this paper. During the calculation process, the sparseness of the mixed matrix in this method is used to reduce the optimization dimension and enhance the evaluation accuracy. The validity and the effectiveness of the proposed methods are verified through simulations and field case studies.
Journal Article
Identifying a Correlation among Qualitative Non-Numeric Parameters in Natural Fish Microbe Dataset Using Machine Learning
by
Hideaki Shima
,
Taiga Asakura
,
Jun Kikuchi
in
Algorithms
,
association rule mining
,
association rules
2022
Recent technical innovations and developments in computer-based technology have enabled bioscience researchers to acquire comprehensive datasets and identify unique parameters within experimental datasets. However, field researchers may face the challenge that datasets exhibit few associations among any measurement results (e.g., from analytical instruments, phenotype observations as well as field environmental data), and may contain non-numerical, qualitative parameters, which make statistical analyses difficult. Here, we propose an advanced analysis scheme that combines two machine learning steps to mine association rules between non-numerical parameters. The aim of this analysis is to identify relationships between variables and enable the visualization of association rules from data of samples collected in the field, which have less correlations between genetic, physical, and non-numerical qualitative parameters. The analysis scheme presented here may increase the potential to identify important characteristics of big datasets.
Journal Article
Multivariate Functional Principal Component Analysis for Data Observed on Different (Dimensional) Domains
2018
Existing approaches for multivariate functional principal component analysis are restricted to data on the same one-dimensional interval. The presented approach focuses on multivariate functional data on different domains that may differ in dimension, such as functions and images. The theoretical basis for multivariate functional principal component analysis is given in terms of a Karhunen-Loève Theorem. For the practically relevant case of a finite Karhunen-Loève representation, a relationship between univariate and multivariate functional principal component analysis is established. This offers an estimation strategy to calculate multivariate functional principal components and scores based on their univariate counterparts. For the resulting estimators, asymptotic results are derived. The approach can be extended to finite univariate expansions in general, not necessarily orthonormal bases. It is also applicable for sparse functional data or data with measurement error. A flexible R implementation is available on CRAN. The new method is shown to be competitive to existing approaches for data observed on a common one-dimensional domain. The motivating application is a neuroimaging study, where the goal is to explore how longitudinal trajectories of a neuropsychological test score covary with FDG-PET brain scans at baseline. Supplementary material, including detailed proofs, additional simulation results, and software is available online.
Journal Article
A Practitioner's Guide to Cluster-Robust Inference
2015
We consider statistical inference for regression when data are grouped into clusters, with regression model errors independent across clusters but correlated within clusters. Examples include data on individuals with clustering on village or region or other category such as industry, and state-year differences-in-differences studies with clustering on state. In such settings, default standard errors can greatly overstate estimator precision. Instead, if the number of clusters is large, statistical inference after OLS should be based on cluster-robust standard errors. We outline the basic method as well as many complications that can arise in practice. These include cluster-specific fixed effects, few clusters, multiway clustering, and estimators other than OLS.
Journal Article
AI versus human-generated multiple-choice questions for medical education: a cohort study in a high-stakes examination
2025
Background
The creation of high-quality multiple-choice questions (MCQs) is essential for medical education assessments but is resource-intensive and time-consuming when done by human experts. Large language models (LLMs) like ChatGPT-4o offer a promising alternative, but their efficacy remains unclear, particularly in high-stakes exams.
Objective
This study aimed to evaluate the quality and psychometric properties of ChatGPT-4o-generated MCQs compared to human-created MCQs in a high-stakes medical licensing exam.
Methods
A prospective cohort study was conducted among medical doctors preparing for the Primary Examination on Emergency Medicine (PEEM) organised by the Hong Kong College of Emergency Medicine in August 2024. Participants attempted two sets of 100 MCQs—one AI-generated and one human-generated. Expert reviewers assessed MCQs for factual correctness, relevance, difficulty, alignment with Bloom’s taxonomy (remember, understand, apply and analyse), and item writing flaws. Psychometric analyses were performed, including difficulty and discrimination indices and KR-20 reliability. Candidate performance and time efficiency were also evaluated.
Results
Among 24 participants, AI-generated MCQs were easier (mean difficulty index = 0.78 ± 0.22 vs. 0.69 ± 0.23,
p
< 0.01) but showed similar discrimination indices to human MCQs (mean = 0.22 ± 0.23 vs. 0.26 ± 0.26). Agreement was moderate (ICC = 0.62,
p
= 0.01, 95% CI: 0.12–0.84). Expert reviews identified more factual inaccuracies (6% vs. 4%), irrelevance (6% vs. 0%), and inappropriate difficulty levels (14% vs. 1%) in AI MCQs. AI questions primarily tested lower-order cognitive skills, while human MCQs better assessed higher-order skills (χ² = 14.27,
p
= 0.003). AI significantly reduced time spent on question generation (24.5 vs. 96 person-hours).
Conclusion
ChatGPT-4o demonstrates the potential for efficiently generating MCQs but lacks the depth needed for complex assessments. Human review remains essential to ensure quality. Combining AI efficiency with expert oversight could optimise question creation for high-stakes exams, offering a scalable model for medical education that balances time efficiency and content quality.
Journal Article
A Model of Text for Experimentation in the Social Sciences
by
Stewart, Brandon M.
,
Roberts, Margaret E.
,
Airoldi, Edoardo M.
in
Applications and Case Studies
,
Causal inference
,
China
2016
Statistical models of text have become increasingly popular in statistics and computer science as a method of exploring large document collections. Social scientists often want to move beyond exploration, to measurement and experimentation, and make inference about social and political processes that drive discourse and content. In this article, we develop a model of text data that supports this type of substantive research. Our approach is to posit a hierarchical mixed membership model for analyzing topical content of documents, in which mixing weights are parameterized by observed covariates. In this model, topical prevalence and topical content are specified as a simple generalized linear model on an arbitrary number of document-level covariates, such as news source and time of release, enabling researchers to introduce elements of the experimental design that informed document collection into the model, within a generally applicable framework. We demonstrate the proposed methodology by analyzing a collection of news reports about China, where we allow the prevalence of topics to evolve over time and vary across newswire services. Our methods quantify the effect of news wire source on both the frequency and nature of topic coverage. Supplementary materials for this article are available online.
Journal Article
Using a Probabilistic Model to Assist Merging of Large-Scale Administrative Records
by
FIFIELD, BENJAMIN
,
IMAI, KOSUKE
,
ENAMORADO, TED
in
Administrative records
,
Algorithms
,
Archives & records
2019
Since most social science research relies on multiple data sources, merging data sets is an essential part of researchers’ workflow. Unfortunately, a unique identifier that unambiguously links records is often unavailable, and data may contain missing and inaccurate information. These problems are severe especially when merging large-scale administrative records. We develop a fast and scalable algorithm to implement a canonical model of probabilistic record linkage that has many advantages over deterministic methods frequently used by social scientists. The proposed methodology efficiently handles millions of observations while accounting for missing data and measurement error, incorporating auxiliary information, and adjusting for uncertainty about merging in post-merge analyses. We conduct comprehensive simulation studies to evaluate the performance of our algorithm in realistic scenarios. We also apply our methodology to merging campaign contribution records, survey data, and nationwide voter files. An open-source software package is available for implementing the proposed methodology.
Journal Article