Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
467
result(s) for
"Synthetic data generation"
Sort by:
Synthetic Generation of Passive Infrared Motion Sensor Data Using a Game Engine
by
Olsson, Carl Magnus
,
Karlsson, Fredrik
,
Persson, Magnus
in
Automation
,
Cameras
,
Computer simulation
2021
Quantifying the number of occupants in an indoor space is useful for a wide variety of applications. Attempts have been made at solving the task using passive infrared (PIR) motion sensor data together with supervised learning methods. Collecting a large labeled dataset containing both PIR motion sensor data and ground truth people count is however time-consuming, often requiring one hour of observation for each hour of data gathered. In this paper, a method is proposed for generating such data synthetically. A simulator is developed in the Unity game engine capable of producing synthetic PIR motion sensor data by detecting simulated occupants. The accuracy of the simulator is tested by replicating a real-world meeting room inside the simulator and conducting an experiment where a set of choreographed movements are performed in the simulated environment as well as the real room. In 34 out of 50 tested situations, the output from the simulated PIR sensors is comparable to the output from the real-world PIR sensors. The developed simulator is also used to study how a PIR sensor’s output changes depending on where in a room a motion is carried out. Through this, the relationship between sensor output and spatial position of a motion is discovered to be highly non-linear, which highlights some of the difficulties associated with mapping PIR data to occupancy count.
Journal Article
Survey on Synthetic Data Generation, Evaluation Methods and GANs
2022
Synthetic data consists of artificially generated data. When data are scarce, or of poor quality, synthetic data can be used, for example, to improve the performance of machine learning models. Generative adversarial networks (GANs) are a state-of-the-art deep generative models that can generate novel synthetic samples that follow the underlying data distribution of the original dataset. Reviews on synthetic data generation and on GANs have already been written. However, none in the relevant literature, to the best of our knowledge, has explicitly combined these two topics. This survey aims to fill this gap and provide useful material to new researchers in this field. That is, we aim to provide a survey that combines synthetic data generation and GANs, and that can act as a good and strong starting point for new researchers in the field, so that they have a general overview of the key contributions and useful references. We have conducted a review of the state-of-the-art by querying four major databases: Web of Sciences (WoS), Scopus, IEEE Xplore, and ACM Digital Library. This allowed us to gain insights into the most relevant authors, the most relevant scientific journals in the area, the most cited papers, the most significant research areas, the most important institutions, and the most relevant GAN architectures. GANs were thoroughly reviewed, as well as their most common training problems, their most important breakthroughs, and a focus on GAN architectures for tabular data. Further, the main algorithms for generating synthetic data, their applications and our thoughts on these methods are also expressed. Finally, we reviewed the main techniques for evaluating the quality of synthetic data (especially tabular data) and provided a schematic overview of the information presented in this paper.
Journal Article
Generation and evaluation of synthetic patient data
by
Stevens, Jennifer
,
Goncalves, Andre
,
Soper, Braden
in
BASIC BIOLOGICAL SCIENCES
,
Cancer patient data
,
Cancer research
2020
Background
Machine learning (ML) has made a significant impact in medicine and cancer research; however, its impact in these areas has been undeniably slower and more limited than in other application domains. A major reason for this has been the lack of availability of patient data to the broader ML research community, in large part due to patient privacy protection concerns. High-quality, realistic, synthetic datasets can be leveraged to accelerate methodological developments in medicine. By and large, medical data is high dimensional and often categorical. These characteristics pose multiple modeling challenges.
Methods
In this paper, we evaluate three classes of synthetic data generation approaches; probabilistic models, classification-based imputation models, and generative adversarial neural networks. Metrics for evaluating the quality of the generated synthetic datasets are presented and discussed.
Results
While the results and discussions are broadly applicable to medical data, for demonstration purposes we generate synthetic datasets for cancer based on the publicly available cancer registry data from the Surveillance Epidemiology and End Results (SEER) program. Specifically, our cohort consists of breast, respiratory, and non-solid cancer cases diagnosed between 2010 and 2015, which includes over 360,000 individual cases.
Conclusions
We discuss the trade-offs of the different methods and metrics, providing guidance on considerations for the generation and usage of medical synthetic data.
Journal Article
A Review of Synthetic Image Data and Its Use in Computer Vision
2022
Development of computer vision algorithms using convolutional neural networks and deep learning has necessitated ever greater amounts of annotated and labelled data to produce high performance models. Large, public data sets have been instrumental in pushing forward computer vision by providing the data necessary for training. However, many computer vision applications cannot rely on general image data provided in the available public datasets to train models, instead requiring labelled image data that is not readily available in the public domain on a large scale. At the same time, acquiring such data from the real world can be difficult, costly to obtain, and manual labour intensive to label in large quantities. Because of this, synthetic image data has been pushed to the forefront as a potentially faster and cheaper alternative to collecting and annotating real data. This review provides general overview of types of synthetic image data, as categorised by synthesised output, common methods of synthesising different types of image data, existing applications and logical extensions, performance of synthetic image data in different applications and the associated difficulties in assessing data performance, and areas for further research.
Journal Article
Pixel-Wise Crowd Understanding via Synthetic Data
2021
Crowd analysis via computer vision techniques is an important topic in the field of video surveillance, which has wide-spread applications including crowd monitoring, public safety, space design and so on. Pixel-wise crowd understanding is the most fundamental task in crowd analysis because of its finer results for video sequences or still images than other analysis tasks. Unfortunately, pixel-level understanding needs a large amount of labeled training data. Annotating them is an expensive work, which causes that current crowd datasets are small. As a result, most algorithms suffer from over-fitting to varying degrees. In this paper, take crowd counting and segmentation as examples from the pixel-wise crowd understanding, we attempt to remedy these problems from two aspects, namely data and methodology. Firstly, we develop a free data collector and labeler to generate synthetic and labeled crowd scenes in a computer game, Grand Theft Auto V. Then we use it to construct a large-scale, diverse synthetic crowd dataset, which is named as “GCC Dataset”. Secondly, we propose two simple methods to improve the performance of crowd understanding via exploiting the synthetic data. To be specific, (1) supervised crowd understanding: pre-train a crowd analysis model on the synthetic data, then fine-tune it using the real data and labels, which makes the model perform better on the real world; (2) crowd understanding via domain adaptation: translate the synthetic data to photo-realistic images, then train the model on translated data and labels. As a result, the trained model works well in real crowd scenes.Extensive experiments verify that the supervision algorithm outperforms the state-of-the-art performance on four real datasets: UCF_CC_50, UCF-QNRF, and Shanghai Tech Part A/B Dataset. The above results show the effectiveness, values of synthetic GCC for the pixel-wise crowd understanding. The tools of collecting/labeling data, the proposed synthetic dataset and the source code for counting models are available at https://gjy3035.github.io/GCC-CL/.
Journal Article
Bt-GAN: Generating Fair Synthetic Healthdata via Bias-transforming Generative Adversarial Networks
by
Heintz, Fredrik
,
Ramachandranpillai, Resmi
,
Sikder, Md Fahim
in
Bias
,
Electronic health records
,
Fair data generation
2024
Synthetic data generation offers a promising solution to enhance the usefulness of Electronic Healthcare Records (EHR) by generating realistic de-identified data. However, the existing literature primarily focuses on the quality of synthetic health data, neglecting the crucial aspect of fairness in downstream predictions. Consequently, models trained on synthetic EHR have faced criticism for producing biased outcomes in target tasks. These biases can arise from either spurious correlations between features or the failure of models to accurately represent sub-groups. To address these concerns, we present Bias-transforming Generative Adversarial Networks (Bt-GAN), a GAN-based synthetic data generator specifically designed for the healthcare domain. In order to tackle spurious correlations (i), we propose an information-constrained Data Generation Process (DGP) that enables the generator to learn a fair deterministic transformation based on a well-defined notion of algorithmic fairness. To overcome the challenge of capturing exact sub-group representations (ii), we incentivize the generator to preserve sub-group densities through score-based weighted sampling. This approach compels the generator to learn from underrepresented regions of the data manifold. To evaluate the effectiveness of our proposed method, we conduct extensive experiments using the Medical Information Mart for Intensive Care (MIMIC-III) database. Our results demonstrate that Bt-GAN achieves state-of-the-art accuracy while significantly improving fairness and minimizing bias amplification. Furthermore, we perform an in-depth explainability analysis to provide additional evidence supporting the validity of our study. In conclusion, our research introduces a novel and professional approach to addressing the limitations of synthetic data generation in the healthcare domain. By incorporating fairness considerations and leveraging advanced techniques such as GANs, we pave the way for more reliable and unbiased predictions in healthcare applications.
Journal Article
Generative Pre-Trained Transformer (GPT) in Research: A Systematic Review on Data Augmentation
2024
GPT (Generative Pre-trained Transformer) represents advanced language models that have significantly reshaped the academic writing landscape. These sophisticated language models offer invaluable support throughout all phases of research work, facilitating idea generation, enhancing drafting processes, and overcoming challenges like writer’s block. Their capabilities extend beyond conventional applications, contributing to critical analysis, data augmentation, and research design, thereby elevating the efficiency and quality of scholarly endeavors. Strategically narrowing its focus, this review explores alternative dimensions of GPT and LLM applications, specifically data augmentation and the generation of synthetic data for research. Employing a meticulous examination of 412 scholarly works, it distills a selection of 77 contributions addressing three critical research questions: (1) GPT on Generating Research data, (2) GPT on Data Analysis, and (3) GPT on Research Design. The systematic literature review adeptly highlights the central focus on data augmentation, encapsulating 48 pertinent scholarly contributions, and extends to the proactive role of GPT in critical analysis of research data and shaping research design. Pioneering a comprehensive classification framework for “GPT’s use on Research Data”, the study classifies existing literature into six categories and 14 sub-categories, providing profound insights into the multifaceted applications of GPT in research data. This study meticulously compares 54 pieces of literature, evaluating research domains, methodologies, and advantages and disadvantages, providing scholars with profound insights crucial for the seamless integration of GPT across diverse phases of their scholarly pursuits.
Journal Article
Functional assessment of bidirectional cortical and peripheral neural control on heartbeat dynamics: A brain-heart study on thermal stress
by
Barbieri, Riccardo
,
Candia-Rivera, Diego
,
Catrambone, Vincenzo
in
Brain
,
Brain-heart interplay
,
Cognitive ability
2022
•We propose a new framework to assess neural dynamics involved in heartbeat control.•The modeling is based on coupled synthetic data generators of EEG and RR series.•Cardiac sympathovagal activity is modelled through Laguerre expansions of RR series.•Time-varying directional brain-heart interplay is quantified under thermal stress.
The study of functional Brain-Heart Interplay (BHI) from non-invasive recordings has gained much interest in recent years. Previous endeavors aimed at understanding how the two dynamical systems exchange information, providing novel holistic biomarkers and important insights on essential cognitive aspects and neural system functioning. However, the interplay between cardiac sympathovagal and cortical oscillations still has much room for further investigation. In this study, we introduce a new computational framework for a functional BHI assessment, namely the Sympatho-Vagal Synthetic Data Generation Model, combining cortical (electroencephalography, EEG) and peripheral (cardiac sympathovagal) neural dynamics. The causal, bidirectional neural control on heartbeat dynamics was quantified on data gathered from 26 human volunteers undergoing a cold-pressor test. Results show that thermal stress induces heart-to-brain functional interplay sustained by EEG oscillations in the delta and gamma bands, primarily originating from sympathetic activity, whereas brain-to-heart interplay originates over central brain regions through sympathovagal control. The proposed methodology provides a viable computational tool for the functional assessment of the causal interplay between cortical and cardiac neural control.
Journal Article
Deep Convolutional Generative Adversarial Networks to Enhance Artificial Intelligence in Healthcare: A Skin Cancer Application
by
Torti, Emanuele
,
Callico, Gustavo M
,
Fabelo, Himar
in
Algorithms
,
Architecture
,
Artificial intelligence
2022
In recent years, researchers designed several artificial intelligence solutions for healthcare applications, which usually evolved into functional solutions for clinical practice. Furthermore, deep learning (DL) methods are well-suited to process the broad amounts of data acquired by wearable devices, smartphones, and other sensors employed in different medical domains. Conceived to serve the role of diagnostic tool and surgical guidance, hyperspectral images emerged as a non-contact, non-ionizing, and label-free technology. However, the lack of large datasets to efficiently train the models limits DL applications in the medical field. Hence, its usage with hyperspectral images is still at an early stage. We propose a deep convolutional generative adversarial network to generate synthetic hyperspectral images of epidermal lesions, targeting skin cancer diagnosis, and overcome small-sized datasets challenges to train DL architectures. Experimental results show the effectiveness of the proposed framework, capable of generating synthetic data to train DL classifiers.
Journal Article
Generation of synthetic manufacturing datasets for machine learning using discrete-event simulation
by
Chan, K. C.
,
Rabaev, Marsel
,
Pratama, Handy
in
Datasets
,
Discrete-event simulation
,
Machine learning
2022
Recent advances in computing power have seen machine learning becoming an area of significant interest in manufacturing for scholars attempting to realise its full potential. Successful machine learning applications require a great amount of specific production data that is not easily nor publicly accessible. This study aims to develop a framework to use discrete-event-simulation (DES) to generate large datasets for training machine learning models. Three DES models were designed and executed to generate synthetic production data for different manufacturing scenarios. Inferences were made on the dependency between the time required to generate data and the complexity of the simulation model. The experimental results show that with the incremental changes in the simulation model, the time required to generate synthetic data tends to increase. The study revealed that DES is an effective tool for generating high-quality synthetic data which can be fed into machine learning models for training. The datasets generated by the simulations are made publicly available.
Journal Article