Catalogue Search | MBRL

Bt-GAN: Generating Fair Synthetic Healthdata via Bias-transforming Generative Adversarial Networks

by Heintz, Fredrik , Ramachandranpillai, Resmi , Sikder, Md Fahim in Bias , Electronic health records , Fair data generation

2024

Synthetic data generation offers a promising solution to enhance the usefulness of Electronic Healthcare Records (EHR) by generating realistic de-identified data. However, the existing literature primarily focuses on the quality of synthetic health data, neglecting the crucial aspect of fairness in downstream predictions. Consequently, models trained on synthetic EHR have faced criticism for producing biased outcomes in target tasks. These biases can arise from either spurious correlations between features or the failure of models to accurately represent sub-groups. To address these concerns, we present Bias-transforming Generative Adversarial Networks (Bt-GAN), a GAN-based synthetic data generator specifically designed for the healthcare domain. In order to tackle spurious correlations (i), we propose an information-constrained Data Generation Process (DGP) that enables the generator to learn a fair deterministic transformation based on a well-defined notion of algorithmic fairness. To overcome the challenge of capturing exact sub-group representations (ii), we incentivize the generator to preserve sub-group densities through score-based weighted sampling. This approach compels the generator to learn from underrepresented regions of the data manifold. To evaluate the effectiveness of our proposed method, we conduct extensive experiments using the Medical Information Mart for Intensive Care (MIMIC-III) database. Our results demonstrate that Bt-GAN achieves state-of-the-art accuracy while significantly improving fairness and minimizing bias amplification. Furthermore, we perform an in-depth explainability analysis to provide additional evidence supporting the validity of our study. In conclusion, our research introduces a novel and professional approach to addressing the limitations of synthetic data generation in the healthcare domain. By incorporating fairness considerations and leveraging advanced techniques such as GANs, we pave the way for more reliable and unbiased predictions in healthcare applications.

Journal Article

Share this book

Add to My Shelf

Bangla Handwritten Digit Recognition and Generation

by Md Fahim Sikder in Handwriting recognition , Pattern recognition

2021

Handwritten digit or numeral recognition is one of the classical issues in the area of pattern recognition and has seen tremendous advancement because of the recent wide availability of computing resources. Plentiful works have already done on English, Arabic, Chinese, Japanese handwritten script. Some work on Bangla also have been done but there is space for development. From that angle, in this paper, an architecture has been implemented which achieved the validation accuracy of 99.44% on BHAND dataset and outperforms Alexnet and Inception V3 architecture. Beside digit recognition, digit generation is another field which has recently caught the attention of the researchers though not many works have been done in this field especially on Bangla. In this paper, a Semi-Supervised Generative Adversarial Network or SGAN has been applied to generate Bangla handwritten numerals and it successfully generated Bangla digits.

Paper

Share this book

Add to My Shelf

TransFusion: Generating Long, High Fidelity Time Series using Diffusion Models with Transformers

by Heintz, Fredrik , Ramachandranpillai, Resmi , Md Fahim Sikder in Artificial neural networks , Generative adversarial networks , Sequences

2024

The generation of high-quality, long-sequenced time-series data is essential due to its wide range of applications. In the past, standalone Recurrent and Convolutional Neural Network-based Generative Adversarial Networks (GAN) were used to synthesize time-series data. However, they are inadequate for generating long sequences of time-series data due to limitations in the architecture. Furthermore, GANs are well known for their training instability and mode collapse problem. To address this, we propose TransFusion, a diffusion, and transformers-based generative model to generate high-quality long-sequence time-series data. We have stretched the sequence length to 384, and generated high-quality synthetic data. Also, we introduce two evaluation metrics to evaluate the quality of the synthetic data as well as its predictive characteristics. We evaluate TransFusion with a wide variety of visual and empirical metrics, and TransFusion outperforms the previous state-of-the-art by a significant margin.

Paper

Share this book

Add to My Shelf

Fair4Free: Generating High-fidelity Fair Synthetic Samples using Data Free Distillation

by de Leng, Daniel , Heintz, Fredrik , Md Fahim Sikder in Datasets , Distillation , Image quality

2024

This work presents Fair4Free, a novel generative model to generate synthetic fair data using data-free distillation in the latent space. Fair4Free can work on the situation when the data is private or inaccessible. In our approach, we first train a teacher model to create fair representation and then distil the knowledge to a student model (using a smaller architecture). The process of distilling the student model is data-free, i.e. the student model does not have access to the training dataset while distilling. After the distillation, we use the distilled model to generate fair synthetic samples. Our extensive experiments show that our synthetic samples outperform state-of-the-art models in all three criteria (fairness, utility and synthetic quality) with a performance increase of 5% for fairness, 8% for utility and 12% in synthetic quality for both tabular and image datasets.

Paper

Share this book

Add to My Shelf

FairX: A comprehensive benchmarking tool for model analysis using fairness, utility, and explainability

by de Leng, Daniel , Heintz, Fredrik , Ramachandranpillai, Resmi in Benchmarks , Datasets , Open source software

2024

We present FairX, an open-source Python-based benchmarking tool designed for the comprehensive analysis of models under the umbrella of fairness, utility, and eXplainability (XAI). FairX enables users to train benchmarking bias-removal models and evaluate their fairness using a wide array of fairness metrics, data utility metrics, and generate explanations for model predictions, all within a unified framework. Existing benchmarking tools do not have the way to evaluate synthetic data generated from fair generative models, also they do not have the support for training fair generative models either. In FairX, we add fair generative models in the collection of our fair-model library (pre-processing, in-processing, post-processing) and evaluation metrics for evaluating the quality of synthetic fair data. This version of FairX supports both tabular and image datasets. It also allows users to provide their own custom datasets. The open-source FairX benchmarking package is publicly available at https://github.com/fahim-sikder/FairX.

Paper

Share this book

Add to My Shelf

Generating Synthetic Fair Syntax-agnostic Data by Learning and Distilling Fair Representation

by de Leng, Daniel , Heintz, Fredrik , Ramachandranpillai, Resmi in Bias , Distillation , Knowledge representation

2024

Data Fairness is a crucial topic due to the recent wide usage of AI powered applications. Most of the real-world data is filled with human or machine biases and when those data are being used to train AI models, there is a chance that the model will reflect the bias in the training data. Existing bias-mitigating generative methods based on GANs, Diffusion models need in-processing fairness objectives and fail to consider computational overhead while choosing computationally-heavy architectures, which may lead to high computational demands, instability and poor optimization performance. To mitigate this issue, in this work, we present a fair data generation technique based on knowledge distillation, where we use a small architecture to distill the fair representation in the latent space. The idea of fair latent space distillation enables more flexible and stable training of Fair Generative Models (FGMs). We first learn a syntax-agnostic (for any data type) fair representation of the data, followed by distillation in the latent space into a smaller model. After distillation, we use the distilled fair latent space to generate high-fidelity fair synthetic data. While distilling, we employ quality loss (for fair distillation) and utility loss (for data utility) to ensure that the fairness and data utility characteristics remain in the distilled latent space. Our approaches show a 5%, 5% and 10% rise in performance in fairness, synthetic sample quality and data utility, respectively, than the state-of-the-art fair generative model.

Paper

Share this book

Add to My Shelf

Bt-GAN: Generating Fair Synthetic Healthdata via Bias-transforming Generative Adversarial Networks

by Heintz, Fredrik , Bergström, David , Ramachandranpillai, Resmi in Bias , Electronic health records , Generative adversarial networks

2024

Synthetic data generation offers a promising solution to enhance the usefulness of Electronic Healthcare Records (EHR) by generating realistic de-identified data. However, the existing literature primarily focuses on the quality of synthetic health data, neglecting the crucial aspect of fairness in downstream predictions. Consequently, models trained on synthetic EHR have faced criticism for producing biased outcomes in target tasks. These biases can arise from either spurious correlations between features or the failure of models to accurately represent sub-groups. To address these concerns, we present Bias-transforming Generative Adversarial Networks (Bt-GAN), a GAN-based synthetic data generator specifically designed for the healthcare domain. In order to tackle spurious correlations (i), we propose an information-constrained Data Generation Process that enables the generator to learn a fair deterministic transformation based on a well-defined notion of algorithmic fairness. To overcome the challenge of capturing exact sub-group representations (ii), we incentivize the generator to preserve sub-group densities through score-based weighted sampling. This approach compels the generator to learn from underrepresented regions of the data manifold. We conduct extensive experiments using the MIMIC-III database. Our results demonstrate that Bt-GAN achieves SOTA accuracy while significantly improving fairness and minimizing bias amplification. We also perform an in-depth explainability analysis to provide additional evidence supporting the validity of our study. In conclusion, our research introduces a novel and professional approach to addressing the limitations of synthetic data generation in the healthcare domain. By incorporating fairness considerations and leveraging advanced techniques such as GANs, we pave the way for more reliable and unbiased predictions in healthcare applications.

Paper

Share this book

Add to My Shelf

Validation and Evaluation of the Psychometric Properties of the Bangla Version of the Brief Pornography Screen in Men and Women

by Potenza, Marc N. , Bőthe, Beáta , Sikder, Md. Tajuddin in Addictions , Addictive behaviors , Adults

2024

With the growth of Internet pornography and the impacts of problematic pornography use (PPU), the validation of short and psychometrically sound screening instruments in different cultures is needed to identify individuals with PPU who may benefit from intervention. This study sought to validate the Bangla version of the Brief Pornography Screen (BPS) and examine its psychometric properties in a large adult sample of men and women. Using an online platform in Bangladesh, we recruited 6023 participants (39.9% women, M age = 22.7 years; SD = 5.0) with lifetime pornography use for the study. To examine the psychometric properties of the BPS, confirmatory factor analysis and gender-based measurement invariance testing were conducted. We assessed theoretically relevant correlates (i.e., self-perceived PPU, depressive and anxiety symptoms) as elements of validity for the scale. The findings suggested that the Bangla BPS version demonstrated strong psychometric properties in terms of factor structure, reliability (i.e., Cronbach’s alpha = .86, composite reliability = .93), and measurement invariance (i.e., men vs. women). The BPS score positively correlated with self-perceived PPU, and anxiety and depressive symptoms, supporting elements of discriminant validity. The Bangla BPS is a reliable and valid self-report scale to assess PPU in the Bangladeshi cultural context.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter