Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
31 result(s) for "Fair data generation"
Sort by:
Bt-GAN: Generating Fair Synthetic Healthdata via Bias-transforming Generative Adversarial Networks
Synthetic data generation offers a promising solution to enhance the usefulness of Electronic Healthcare Records (EHR) by generating realistic de-identified data. However, the existing literature primarily focuses on the quality of synthetic health data, neglecting the crucial aspect of fairness in downstream predictions. Consequently, models trained on synthetic EHR have faced criticism for producing biased outcomes in target tasks. These biases can arise from either spurious correlations between features or the failure of models to accurately represent sub-groups. To address these concerns, we present Bias-transforming Generative Adversarial Networks (Bt-GAN), a GAN-based synthetic data generator specifically designed for the healthcare domain. In order to tackle spurious correlations (i), we propose an information-constrained Data Generation Process (DGP) that enables the generator to learn a fair deterministic transformation based on a well-defined notion of algorithmic fairness. To overcome the challenge of capturing exact sub-group representations (ii), we incentivize the generator to preserve sub-group densities through score-based weighted sampling. This approach compels the generator to learn from underrepresented regions of the data manifold. To evaluate the effectiveness of our proposed method, we conduct extensive experiments using the Medical Information Mart for Intensive Care (MIMIC-III) database. Our results demonstrate that Bt-GAN achieves state-of-the-art accuracy while significantly improving fairness and minimizing bias amplification. Furthermore, we perform an in-depth explainability analysis to provide additional evidence supporting the validity of our study. In conclusion, our research introduces a novel and professional approach to addressing the limitations of synthetic data generation in the healthcare domain. By incorporating fairness considerations and leveraging advanced techniques such as GANs, we pave the way for more reliable and unbiased predictions in healthcare applications.
TabFairGAN: Fair Tabular Data Generation with Generative Adversarial Networks
With the increasing reliance on automated decision making, the issue of algorithmic fairness has gained increasing importance. In this paper, we propose a Generative Adversarial Network for tabular data generation. The model includes two phases of training. In the first phase, the model is trained to accurately generate synthetic data similar to the reference dataset. In the second phase we modify the value function to add fairness constraint, and continue training the network to generate data that is both accurate and fair. We test our results in both cases of unconstrained, and constrained fair data generation. We show that using a fairly simple architecture and applying quantile transformation of numerical attributes the model achieves promising performance. In the unconstrained case, i.e., when the model is only trained in the first phase and is only meant to generate accurate data following the same joint probability distribution of the real data, the results show that the model beats the state-of-the-art GANs proposed in the literature to produce synthetic tabular data. Furthermore, in the constrained case in which the first phase of training is followed by the second phase, we train the network and test it on four datasets studied in the fairness literature and compare our results with another state-of-the-art pre-processing method, and present the promising results that it achieves. Comparing to other studies utilizing GANs for fair data generation, our model is comparably more stable by using only one critic, and also by avoiding major problems of original GAN model, such as mode-dropping and non-convergence.
The Type 1 Diabetes T Cell Receptor and B Cell Receptor Repository in the AIRR Data Commons: a practical guide for access, use and contributions through the Type 1 Diabetes AIRR Consortium
Human molecular genetics has brought incredible insights into the variants that confer risk for the development of tissue-specific autoimmune diseases, including type 1 diabetes. The hallmark cell-mediated immune destruction that is characteristic of type 1 diabetes is closely linked with risk conferred by the HLA class II gene locus, in combination with a broad array of additional candidate genes influencing islet-resident beta cells within the pancreas, as well as function, phenotype and trafficking of immune cells to tissues. In addition to the well-studied germline SNP variants, there are critical contributions conferred by T cell receptor (TCR) and B cell receptor (BCR) genes that undergo somatic recombination to yield the Adaptive Immune Receptor Repertoire (AIRR) responsible for autoimmunity in type 1 diabetes. We therefore created the T1D TCR/BCR Repository (The Type 1 Diabetes T Cell Receptor and B Cell Receptor Repository) to study these highly variable and dynamic gene rearrangements. In addition to processed TCR and BCR sequences, the T1D TCR/BCR Repository includes detailed metadata (e.g. participant demographics, disease-associated parameters and tissue type). We introduce the Type 1 Diabetes AIRR Consortium goals and outline methods to use and deposit data to this comprehensive repository. Our ultimate goal is to facilitate research community access to rich, carefully annotated immune AIRR datasets to enable new scientific inquiry and insight into the natural history and pathogenesis of type 1 diabetes. Graphical Abstract
Sensing User Intent: An LLM-Powered Agent for On-the-Fly Personalized Virtual Space Construction from UAV Sensor Data
The proliferation of Unmanned Aerial Vehicles (UAVs) enables the large-scale collection of ecological data, yet translating this dynamic sensor data into engaging, personalized public experiences remains a significant challenge. Existing solutions fall short: static exhibitions lack adaptability, while general-purpose LLM agents struggle with real-time responsiveness and reliability. To address this, we introduce CurationAgent, a novel intelligent agent built upon the State-Gated Agent Architecture (SGAA). Its core innovation is an advanced hybrid curation pipeline that synergizes Retrieval-Augmented Generation (RAG) for broad semantic recall with an Intent-Driven Curation (IDC) Funnel for precise intent formalization and narrative synthesis. This hybrid model robustly translates user intent into a curated, multi-modal narrative. We validate this framework in a proof-of-concept virtual exhibition of the Lalu Wetland’s biodiversity. Our comprehensive evaluation demonstrates that CurationAgent is significantly more responsive (1512 ms vs. 4301 ms), reliable (95% vs. 57% task success), and precise (85.5% vs. 52.7% query precision) than standard agent architectures. Furthermore, a user study with 27 participants confirmed our system leads to measurably higher user engagement. This work contributes a robust and responsive agent architecture that validates a new paradigm for interactive systems, shifting from passive information retrieval to active, partnered experience curation.
A Photovoltaic System Model Integrating FAIR Digital Objects and Ontologies
Smart grids of the future will create and provide huge data volumes, which are subject to FAIR (Findable, Accessible, Interoperable, and Reusable) data management solutions when used within the scientific domain and for operation. FAIR Digital Objects (FDOs) provide access to (meta)data, and ontologies explicitly describe metadata as well as application data objects and domains. The present paper proposes a novel approach to integrate FAIR digital objects and ontologies as metadata models in order to support data access for energy researchers, energy research applications, operational applications and energy information systems. As the first example domain to be modeled using an ontology and to get integrated with FAIR digital objects, a photovoltaic (PV) system model is selected. For the given purpose, a discussion of existing energy ontologies shows the necessity to develop a new PV ontology. By integration of FDOs, this new PV ontology is introduced in the present paper. Furthermore, the concept of FDOs is integrated with the PV ontology in such a way that it allows for generalization. By this, the present paper contributes to a sustainable data management for smart grid operation, especially for interoperability, by using ontologies and, hence, unambiguous semantics. An information system application that visualizes the PV system, its describing data and collected sensor data, is proposed. As a proof of concept the details of the use case implementation are presented.
The NIAID Discovery Portal: a unified search engine for infectious and immune-mediated disease datasets
Valuable data sets are often overlooked because they are difficult to locate. The NIAID Data Ecosystem Discovery Portal fills this gap by providing a centralized, searchable interface that empowers users with varying levels of technical expertise to find and reuse data. By standardizing key metadata fields and harmonizing heterogeneous formats, the Portal improves data findability, accessibility, and reusability. This resource supports hypothesis generation, comparative analysis, and secondary use of public data by the IID research community, including those funded by NIAID. The Portal supports data sharing by standardizing metadata and linking to source repositories and maximizes the impact of public investment in research data by supporting scientific advancement via secondary use.
Ad Hoc Data Foraging in a Life Sciences Community Ecosystem Using SoDa
Biologists often set out to find relevant data in an ever-changing landscape of interesting databases. While leading journals publish descriptions of databases, they are usually not recent and do not frequently update the list that discards defunct or poor-quality databases. These indices usually include databases that are proactively requested to be included by their authors. The challenge for individual biologists, then, is to discover, explore, and select databases of interest from a large unorganized collection and effectively use them in their analysis without too large of an investment. The advocation of the FAIR data principle to improve searching, finding, accessing, and inter-operating among these diverse information sources in order to increase usability is proving to be a difficult proposition and consequently, a large number of data sources are not FAIR-compliant. Since linked open data do not guarantee FAIRness, biologists are now left to individually search for information in open networks. In this paper, we propose SoDa, for intelligent data foraging on the internet by biologists. SoDa helps biologists to discover resources based on analysis requirements and generate resource access plans, as well as storing cleaned data and knowledge for community use. SoDa includes a natural language-powered resource discovery tool, a tool to retrieve data from remote databases, organize and store collected data, query stored data, and seek help from the community when things do not work as anticipated. A secondary search index is also supported for community members to find archived information in a convenient way to enable its reuse. The features supported in SoDa endows biologists with data integration capabilities over arbitrary linked open databases and construct powerful computational pipelines using them, capabilities that are not supported in most contemporary biological workflow systems, such as Taverna or Galaxy.
X-search: an open access interface for cross-cohort exploration of the National Sleep Research Resource
Background The National Sleep Research Resource (NSRR) is a large-scale, openly shared, data repository of de-identified, highly curated clinical sleep data from multiple NIH-funded epidemiological studies. Although many data repositories allow users to browse their content, few support fine-grained, cross-cohort query and exploration at study-subject level. We introduce a cross-cohort query and exploration system, called X-search, to enable researchers to query patient cohort counts across a growing number of completed, NIH-funded studies in NSRR and explore the feasibility or likelihood of reusing the data for research studies. Methods X-search has been designed as a general framework with two loosely-coupled components: semantically annotated data repository and cross-cohort exploration engine. The semantically annotated data repository is comprised of a canonical data dictionary, data sources with a data dictionary, and mappings between each individual data dictionary and the canonical data dictionary. The cross-cohort exploration engine consists of five modules: query builder, graphical exploration, case-control exploration, query translation, and query execution. The canonical data dictionary serves as the unified metadata to drive the visual exploration interfaces and facilitate query translation through the mappings. Results X-search is publicly available at https://www.x-search.net/ with nine NSRR datasets consisting of over 26,000 unique subjects. The canonical data dictionary contains over 900 common data elements across the datasets. X-search has received over 1800 cross-cohort queries by users from 16 countries. Conclusions X-search provides a powerful cross-cohort exploration interface for querying and exploring heterogeneous datasets in the NSRR data repository, so as to enable researchers to evaluate the feasibility of potential research studies and generate potential hypotheses using the NSRR data.
Facilitating accessible, rapid, and appropriate processing of ancient metagenomic data with AMDirT version 2; peer review: 1 approved, 3 approved with reservations
Background Access to sample-level metadata is important when selecting public metagenomic sequencing datasets for reuse in new biological analyses. The Standards, Precautions, and Advances in Ancient Metagenomics community (SPAAM, https://spaam-community.org) has previously published AncientMetagenomeDir, a collection of curated and standardised sample metadata tables for metagenomic and microbial genome datasets generated from ancient samples. However, while sample-level information is useful for identifying relevant samples for inclusion in new projects, Next Generation Sequencing (NGS) library construction and sequencing metadata are also essential for appropriately reprocessing ancient metagenomic data. Currently, recovering information for downloading and preparing such data is difficult when laboratory and bioinformatic metadata is heterogeneously recorded in prose-based publications. Methods Through a series of community-based hackathon events, AncientMetagenomeDir was updated to provide standardised library-level metadata of existing and new ancient metagenomic samples. In tandem, the companion tool 'AMDirT' was developed to facilitate rapid data filtering and downloading of ancient metagenomic data, as well as improving automated metadata curation and validation for AncientMetagenomeDir. Results AncientMetagenomeDir was extended to include standardised metadata of over 6000 ancient metagenomic libraries. The companion tool 'AMDirT' provides both graphical- and command-line interface based access to such metadata for users from a wide range of computational backgrounds. We also report on errors with metadata reporting that appear to commonly occur during data upload and provide suggestions on how to improve the quality of data sharing by the community. Conclusions Together, both standardised metadata reporting and tooling will help towards easier incorporation and reuse of public ancient metagenomic datasets into future analyses.
A global fairtrade partnership needed to address injustices in the supply chains of clean energy technology materials
Renewable sources produced close to one-third of the world’s electricity in 2023. However, a limited but growing body of research suggests rapid renewable energy development is leading to conflict and resource exploitation in energy-transitioning communities. Such injustices are attributable to the extractivist nature of renewable energy development, where raw materials, also known as Clean Energy Technology Materials (CETMs), are in limited quantities and often concentrated in resource-constrained zones in the Global South. In this perspective, we call for an urgent need for energy justice considerations in CETM’s supply chain. We used demand projection data from 2020 to 2040 to look into the effects of important CETMs like nickel, cobalt, and lithium on distributive justice. We also examined the potential of these effects to tackle systemic injustices such as conflict, labor exploitation, and transactional colonialism. Next, we analyzed global mining production data from the United States Geological Survey using a CETM life cycle lens and found that increasing demand for these materials is exacerbating restorative injustices, particularly in the Global South. Finally, building on the above evidence, we called for the creation of multi-stakeholder partnerships and the establishment of fair trade standards across the critical CETM supply chain. Graphical abstract Highlights Here, we analyzed the projected demand growth for selected clean energy technology materials by 2040 relative to 2020 levels using data from the International Energy Agency, visualized their global mining production using data from the United States Geological Survey, explained how the demand for these materials is exacerbating certain injustices, and recommended multi-stakeholder partnerships across the supply chain of these materials. Discussion The rapid growth of renewable energy technologies is creating injustices throughout the supply chain of clean energy technology materials (CETM). A lack of any energy justice framework across CETMs’ extraction, processing, decommissioning, and recycling is exacerbating restorative injustices, especially in the Global South. By examining the projected demands and geospatial patterns for the extraction of minerals, metals, and other materials essential for clean energy technology development, the inequities faced by impoverished, marginalized, and Indigenous communities become apparent. We argue that if coffee can have fair trade standards across its supply chain, why can’t we have similar considerations for the CETMs? There is a need to include transparency in the sustainability, ethics, and energy efficiency of CETM extraction and processing through global partnerships across its supply chain.