Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
26,483
result(s) for
"public dataset"
Sort by:
HARTH: A Human Activity Recognition Dataset for Machine Learning
by
Bårdstu, Hilde Bremseth
,
Mork, Paul Jarle
,
Bach, Kerstin
in
Accelerometers
,
Annotations
,
benchmark
2021
Existing accelerometer-based human activity recognition (HAR) benchmark datasets that were recorded during free living suffer from non-fixed sensor placement, the usage of only one sensor, and unreliable annotations. We make two contributions in this work. First, we present the publicly available Human Activity Recognition Trondheim dataset (HARTH). Twenty-two participants were recorded for 90 to 120 min during their regular working hours using two three-axial accelerometers, attached to the thigh and lower back, and a chest-mounted camera. Experts annotated the data independently using the camera’s video signal and achieved high inter-rater agreement (Fleiss’ Kappa =0.96). They labeled twelve activities. The second contribution of this paper is the training of seven different baseline machine learning models for HAR on our dataset. We used a support vector machine, k-nearest neighbor, random forest, extreme gradient boost, convolutional neural network, bidirectional long short-term memory, and convolutional neural network with multi-resolution blocks. The support vector machine achieved the best results with an F1-score of 0.81 (standard deviation: ±0.18), recall of 0.85±0.13, and precision of 0.79±0.22 in a leave-one-subject-out cross-validation. Our highly professional recordings and annotations provide a promising benchmark dataset for researchers to develop innovative machine learning approaches for precise HAR in free living.
Journal Article
A Large-Scale Open Motion Dataset (KFall) and Benchmark Algorithms for Detecting Pre-impact Fall of the Elderly Using Wearable Inertial Sensors
by
Yu, Xiaoqun
,
Jang, Jaehyuk
,
Xiong, Shuping
in
Activities of daily living
,
algorithm development
,
Algorithms
2021
Research on pre-impact fall detection with wearable inertial sensors (detecting fall accidents prior to body-ground impacts) has grown rapidly in the past decade due to its great potential for developing an on-demand fall-related injury prevention system. However, most researchers use their own datasets to develop fall detection algorithms and rarely make these datasets publicly available, which poses a challenge to fairly evaluate the performance of different algorithms on a common basis. Even though some open datasets have been established recently, most of them are impractical for pre-impact fall detection due to the lack of temporal labels for fall time and limited types of motions. In order to overcome these limitations, in this study, we proposed and publicly provided a large-scale motion dataset called “KFall,” which was developed from 32 Korean participants while wearing an inertial sensor on the low back and performing 21 types of activities of daily living and 15 types of simulated falls. In addition, ready-to-use temporal labels of the fall time based on synchronized motion videos were published along with the dataset. Those enhancements make KFall the first public dataset suitable for pre-impact fall detection, not just for post-fall detection. Importantly, we have also developed three different types of latest algorithms (threshold based, support-vector machine, and deep learning), using the KFall dataset for pre-impact fall detection so that researchers and practitioners can flexibly choose the corresponding algorithm. Deep learning algorithm achieved both high overall accuracy and balanced sensitivity (99.32%) and specificity (99.01%) for pre-impact fall detection. Support vector machine also demonstrated a good performance with a sensitivity of 99.77% and specificity of 94.87%. However, the threshold-based algorithm showed relatively poor results, especially the specificity (83.43%) was much lower than the sensitivity (95.50%). The performance of these algorithms could be regarded as a benchmark for further development of better algorithms with this new dataset. This large-scale motion dataset and benchmark algorithms could provide researchers and practitioners with valuable data and references to develop new technologies and strategies for pre-impact fall detection and proactive injury prevention for the elderly.
Journal Article
A survey on video-based Human Action Recognition: recent updates, datasets, challenges, and applications
2021
Human Action Recognition (HAR) involves human activity monitoring task in different areas of medical, education, entertainment, visual surveillance, video retrieval, as well as abnormal activity identification, to name a few. Due to an increase in the usage of cameras, automated systems are in demand for the classification of such activities using computationally intelligent techniques such as Machine Learning (ML) and Deep Learning (DL). In this survey, we have discussed various ML and DL techniques for HAR for the years 2011–2019. The paper discusses the characteristics of public datasets used for HAR. It also presents a survey of various action recognition techniques along with the HAR applications namely, content-based video summarization, human–computer interaction, education, healthcare, video surveillance, abnormal activity detection, sports, and entertainment. The advantages and disadvantages of action representation, dimensionality reduction, and action analysis methods are also provided. The paper discusses challenges and future directions for HAR.
Journal Article
A Large-Scale COVID-19 Twitter Chatter Dataset for Open Scientific Research—An International Collaboration
2021
As the COVID-19 pandemic continues to spread worldwide, an unprecedented amount of open data is being generated for medical, genetics, and epidemiological research. The unparalleled rate at which many research groups around the world are releasing data and publications on the ongoing pandemic is allowing other scientists to learn from local experiences and data generated on the front lines of the COVID-19 pandemic. However, there is a need to integrate additional data sources that map and measure the role of social dynamics of such a unique worldwide event in biomedical, biological, and epidemiological analyses. For this purpose, we present a large-scale curated dataset of over 1.12 billion tweets, growing daily, related to COVID-19 chatter generated from 1 January 2020 to 27 June 2021 at the time of writing. This data source provides a freely available additional data source for researchers worldwide to conduct a wide and diverse number of research projects, such as epidemiological analyses, emotional and mental responses to social distancing measures, the identification of sources of misinformation, stratified measurement of sentiment towards the pandemic in near real time, among many others.
Journal Article
Review of public motor imagery and execution datasets in brain-computer interfaces
by
Gwon, Daeun
,
Ahn, Minkyu
,
Song, Minseok
in
Accuracy
,
Brain research
,
brain-computer interface (BCI)
2023
The demand for public datasets has increased as data-driven methodologies have been introduced in the field of brain-computer interfaces (BCIs). Indeed, many BCI datasets are available in various platforms or repositories on the web, and the studies that have employed these datasets appear to be increasing. Motor imagery is one of the significant control paradigms in the BCI field, and many datasets related to motor tasks are open to the public already. However, to the best of our knowledge, these studies have yet to investigate and evaluate the datasets, although data quality is essential for reliable results and the design of subject− or system-independent BCIs. In this study, we conducted a thorough investigation of motor imagery/execution EEG datasets recorded from healthy participants published over the past 13 years. The 25 datasets were collected from six repositories and subjected to a meta-analysis. In particular, we reviewed the specifications of the recording settings and experimental design, and evaluated the data quality measured by classification accuracy from standard algorithms such as Common Spatial Pattern (CSP) and Linear Discriminant Analysis (LDA) for comparison and compatibility across the datasets. As a result, we found that various stimulation types, such as text, figure, or arrow, were used to instruct subjects what to imagine and the length of each trial also differed, ranging from 2.5 to 29 s with a mean of 9.8 s. Typically, each trial consisted of multiple sections: pre-rest (2.38 s), imagination ready (1.64 s), imagination (4.26 s, ranging from 1 to 10 s), the post-rest (3.38 s). In a meta-analysis of the total of 861 sessions from all datasets, the mean classification accuracy of the two-class (left-hand vs. right-hand motor imagery) problem was 66.53%, and the population of the BCI poor performers, those who are unable to reach proficiency in using a BCI system, was 36.27% according to the estimated accuracy distribution. Further, we analyzed the CSP features and found that each dataset forms a cluster, and some datasets overlap in the feature space, indicating a greater similarity among them. Finally, we checked the minimal essential information (continuous signals, event type/latency, and channel information) that should be included in the datasets for convenient use, and found that only 71% of the datasets met those criteria. Our attempts to evaluate and compare the public datasets are timely, and these results will contribute to understanding the dataset’s quality and recording settings as well as the use of using public datasets for future work on BCIs.
Journal Article
Diverse Dataset for Eyeglasses Detection: Extending the Flickr-Faces-HQ (FFHQ) Dataset
2024
Facial analysis is an important area of research in computer vision and machine learning, with applications spanning security, healthcare, and user interaction systems. The data-centric AI approach emphasizes the importance of high-quality, diverse, and well-annotated datasets in driving advancements in this field. However, current facial datasets, such as Flickr-Faces-HQ (FFHQ), lack detailed annotations for detecting facial accessories, particularly eyeglasses. This work addresses this limitation by extending the FFHQ dataset with precise bounding box annotations for eyeglasses detection, enhancing its utility for data-centric AI applications. The extended dataset comprises 70,000 images, including over 16,000 images containing eyewear, and it exceeds the CelebAMask-HQ dataset in size and diversity. A semi-automated protocol was employed to efficiently generate accurate bounding box annotations, minimizing the demand for extensive manual labeling. This enriched dataset serves as a valuable resource for training and benchmarking eyewear detection models. Additionally, the baseline benchmark results for eyeglasses detection were presented using deep learning methods, including YOLOv8 and MobileNetV3. The evaluation, conducted through cross-dataset validation, demonstrated the robustness of models trained on the extended FFHQ dataset with their superior performances over existing alternative CelebAMask-HQ. The extended dataset, which has been made publicly available, is expected to support future research and development in eyewear detection, contributing to advancements in facial analysis and related fields.
Journal Article
eHomeSeniors Dataset: An Infrared Thermal Sensor Dataset for Automatic Fall Detection Research
by
Rodenas, Tomás
,
Taramasco, Carla
,
Espinoza, Cristina
in
Accidental Falls
,
Activities of daily living
,
Adult
2019
Automatic fall detection is a very active research area, which has grown explosively since the 2010s, especially focused on elderly care. Rapid detection of falls favors early awareness from the injured person, reducing a series of negative consequences in the health of the elderly. Currently, there are several fall detection systems (FDSs), mostly based on predictive and machine-learning approaches. These algorithms are based on different data sources, such as wearable devices, ambient-based sensors, or vision/camera-based approaches. While wearable devices like inertial measurement units (IMUs) and smartphones entail a dependence on their use, most image-based devices like Kinect sensors generate video recordings, which may affect the privacy of the user. Regardless of the device used, most of these FDSs have been tested only in controlled laboratory environments, and there are still no mass commercial FDS. The latter is partly due to the impossibility of counting, for ethical reasons, with datasets generated by falls of real older adults. All public datasets generated in laboratory are performed by young people, without considering the differences in acceleration and falling features of older adults. Given the above, this article presents the eHomeSeniors dataset, a new public dataset which is innovative in at least three aspects: first, it collects data from two different privacy-friendly infrared thermal sensors; second, it is constructed by two types of volunteers: normal young people (as usual) and performing artists, with the latter group assisted by a physiotherapist to emulate the real fall conditions of older adults; and third, the types of falls selected are the result of a thorough literature review.
Journal Article
PICCOLO White-Light and Narrow-Band Imaging Colonoscopic Dataset: A Performance Comparative of Models and Datasets
2020
Colorectal cancer is one of the world leading death causes. Fortunately, an early diagnosis allows for effective treatment, increasing the survival rate. Deep learning techniques have shown their utility for increasing the adenoma detection rate at colonoscopy, but a dataset is usually required so the model can automatically learn features that characterize the polyps. In this work, we present the PICCOLO dataset, that comprises 3433 manually annotated images (2131 white-light images 1302 narrow-band images), originated from 76 lesions from 40 patients, which are distributed into training (2203), validation (897) and test (333) sets assuring patient independence between sets. Furthermore, clinical metadata are also provided for each lesion. Four different models, obtained by combining two backbones and two encoder–decoder architectures, are trained with the PICCOLO dataset and other two publicly available datasets for comparison. Results are provided for the test set of each dataset. Models trained with the PICCOLO dataset have a better generalization capacity, as they perform more uniformly along test sets of all datasets, rather than obtaining the best results for its own test set. This dataset is available at the website of the Basque Biobank, so it is expected that it will contribute to the further development of deep learning methods for polyp detection, localisation and classification, which would eventually result in a better and earlier diagnosis of colorectal cancer, hence improving patient outcomes.
Journal Article
Activity recognition using wearable sensors for tracking the elderly
by
Diana, van Heemst
,
Knobbe Arno
,
Paraschiakos Stylianos
in
Accelerometers
,
Accuracy
,
Activity recognition
2020
A population group that is often overlooked in the recent revolution of self-tracking is the group of older people. This growing proportion of the general population is often faced with increasing health issues and discomfort. In order to come up with lifestyle advice towards the elderly, we need the ability to quantify their lifestyle, before and after an intervention. This research focuses on the task of activity recognition (AR) from accelerometer data. With that aim, we collect a substantial labelled dataset of older individuals wearing multiple devices simultaneously and performing a strict protocol of 16 activities (the GOTOV dataset, N=28). Using this dataset, we trained Random Forest AR models, under varying sensor set-ups and levels of activity description granularity. The model that combines ankle and wrist accelerometers (GENEActiv) produced the best results (accuracy >80%) for 16-class classification. At the same time, when additional physiological information is used, the accuracy increased (>85%). To further investigate the role of granularity in our predictions, we developed the LARA algorithm, which uses a hierarchical ontology that captures prior biological knowledge to increase or decrease the level of activity granularity (merge classes). As a result, a 12-class model in which the different paces of walking were merged showed a performance above 93%. Testing this 12-class model in labelled free-living pilot data, the mean balanced accuracy appeared to be reasonably high, while using the LARA algorithm, we show that a 7-class model (lying down, sitting, standing, household, walking, cycling, jumping) was optimal for accuracy and granularity. Finally, we demonstrate the use of the latter model in unlabelled free-living data from a larger lifestyle intervention study. In this paper, we make the validation data as well as the derived prediction models available to the community.
Journal Article
A Benchmark Dataset for RSVP-Based Brain–Computer Interfaces
2020
This paper reports a benchmark dataset acquired with a brain-computer interface (BCI) system based on the rapid serial visual presentation (RSVP) paradigm. The dataset consists of 64-channel Electroencephalogram (EEG) data from 64 healthy subjects (sub1, …, sub64) while they performed a target images detection task. For each subject, the data contained 2 groups (‘A’ and ‘B’). Each group contained 2 blocks and each block included 40 trials which corresponded to 40 stimuli sequences. Each sequence contained 100 images presented at 10 Hz (10 images per second). The stimuli images were street view images of two categories: target images with human and non-target images without human. Target images were presented randomly in the stimulus sequence with a probability of 1~4%. During the stimulus presentation, subjects were asked to search for the target images and ignore the non-target images in a subjective manner. To keep all original information, the dataset was the raw continuous data without any processing. On one hand, the dataset can be used as a benchmark dataset to compare the algorithms for target identification in RSVP-based BCIs. On the other hand, the dataset can be used to design new system diagrams and evaluate their BCI performance without collecting any new data through offline simulation. Furthermore, the dataset also provides high-quality data for characterizing and modeling event-related potentials (ERPs) and steady-state visual evoked potentials (SSVEPs) in RSVP-based BCIs. The dataset is freely available from http://bci.med.tsinghua.edu.cn/download.html.
Journal Article