Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
      More Filters
      Clear All
      More Filters
      Source
    • Language
874 result(s) for "Data harmonization"
Sort by:
The quest for seafloor macrolitter: a critical review of background knowledge, current methods and future prospects
The seafloor covers some 70% of the Earth’s surface and has been recognised as a major sink for marine litter. Still, litter on the seafloor is the least investigated fraction of marine litter, which is not surprising as most of it lies in the deep sea, i.e. the least explored ecosystem. Although marine litter is considered a major threat for the oceans, monitoring frameworks are still being set up. This paper reviews current knowledge and methods, identifies existing needs, and points to future developments that are required to address the estimation of seafloor macrolitter. It provides background knowledge and conveys the views and thoughts of scientific experts on seafloor marine litter offering a review of monitoring and ocean modelling techniques. Knowledge gaps that need to be tackled, data needs for modelling, and data comparability and harmonisation are also discussed. In addition, it shows how research on seafloor macrolitter can inform international protection and conservation frameworks to prioritise efforts and measures against marine litter and its deleterious impacts.
How European Research Projects Can Support Vaccination Strategies: The Case of the ORCHESTRA Project for SARS-CoV-2
ORCHESTRA (“Connecting European Cohorts to Increase Common and Effective Response To SARS-CoV-2 Pandemic”) is an EU-funded project which aims to help rapidly advance the knowledge related to the prevention of the SARS-CoV-2 infection and the management of COVID-19 and its long-term sequelae. Here, we describe the early results of this project, focusing on the strengths of multiple, international, historical and prospective cohort studies and highlighting those results which are of potential relevance for vaccination strategies, such as the necessity of a vaccine booster dose after a primary vaccination course in hematologic cancer patients and in solid organ transplant recipients to elicit a higher antibody titer, and the protective effect of vaccination on severe COVID-19 clinical manifestation and on the emergence of post-COVID-19 conditions. Valuable data regarding epidemiological variations, risk factors of SARS-CoV-2 infection and its sequelae, and vaccination efficacy in different subpopulations can support further defining public health vaccination policies.
Estimating prevalence of subjective cognitive decline in and across international cohort studies of aging: a COSMIC study
Background Subjective cognitive decline (SCD) is recognized as a risk stage for Alzheimer’s disease (AD) and other dementias, but its prevalence is not well known. We aimed to use uniform criteria to better estimate SCD prevalence across international cohorts. Methods We combined individual participant data for 16 cohorts from 15 countries (members of the COSMIC consortium) and used qualitative and quantitative (Item Response Theory/IRT) harmonization techniques to estimate SCD prevalence. Results The sample comprised 39,387 cognitively unimpaired individuals above age 60. The prevalence of SCD across studies was around one quarter with both qualitative harmonization/QH (23.8%, 95%CI = 23.3–24.4%) and IRT (25.6%, 95%CI = 25.1–26.1%); however, prevalence estimates varied largely between studies (QH 6.1%, 95%CI = 5.1–7.0%, to 52.7%, 95%CI = 47.4–58.0%; IRT: 7.8%, 95%CI = 6.8–8.9%, to 52.7%, 95%CI = 47.4–58.0%). Across studies, SCD prevalence was higher in men than women, in lower levels of education, in Asian and Black African people compared to White people, in lower- and middle-income countries compared to high-income countries, and in studies conducted in later decades. Conclusions SCD is frequent in old age. Having a quarter of older individuals with SCD warrants further investigation of its significance, as a risk stage for AD and other dementias, and of ways to help individuals with SCD who seek medical advice. Moreover, a standardized instrument to measure SCD is needed to overcome the measurement variability currently dominant in the field.
Breaking Digital Health Barriers Through a Large Language Model–Based Tool for Automated Observational Medical Outcomes Partnership Mapping: Development and Validation Study
The integration of diverse clinical data sources requires standardization through models such as Observational Medical Outcomes Partnership (OMOP). However, mapping data elements to OMOP concepts demands significant technical expertise and time. While large health care systems often have resources for OMOP conversion, smaller clinical trials and studies frequently lack such support, leaving valuable research data siloed. This study aims to develop and validate a user-friendly tool that leverages large language models to automate the OMOP conversion process for clinical trials, electronic health records, and registry data. We developed a 3-tiered semantic matching system using GPT-3 embeddings to transform heterogeneous clinical data to the OMOP Common Data Model. The system processes input terms by generating vector embeddings, computing cosine similarity against precomputed Observational Health Data Sciences and Informatics vocabulary embeddings, and ranking potential matches. We validated the system using two independent datasets: (1) a development set of 76 National Institutes of Health Helping to End Addiction Long-term Initiative clinical trial common data elements for chronic pain and opioid use disorders and (2) a separate validation set of electronic health record concepts from the National Institutes of Health National COVID Cohort Collaborative COVID-19 enclave. The architecture combines Unified Medical Language System semantic frameworks with asynchronous processing for efficient concept mapping, made available through an open-source implementation. The system achieved an area under the receiver operating characteristic curve of 0.9975 for mapping clinical trial common data element terms. Precision ranged from 0.92 to 0.99 and recall ranged from 0.88 to 0.97 across similarity thresholds from 0.85 to 1.0. In practical application, the tool successfully automated mappings that previously required manual informatics expertise, reducing the technical barriers for research teams to participate in large-scale, data-sharing initiatives. Representative mappings demonstrated high accuracy, such as demographic terms achieving 100% similarity with corresponding Logical Observation Identifiers Names and Codes concepts. The implementation successfully processes diverse data types through both individual term mapping and batch processing capabilities. Our validated large language model-based tool effectively automates the transformation of clinical data into the OMOP format while maintaining high accuracy. The combination of semantic matching capabilities and a researcher-friendly interface makes data harmonization accessible to smaller research teams without requiring extensive informatics support. This has direct implications for accelerating clinical research data standardization and enabling broader participation in initiatives such as the National Institutes of Health Helping to End Addiction Long-term Initiative Data Ecosystem.
Informing Harmonization Decisions in Integrative Data Analysis: Exploring the Measurement Multiverse
Combining datasets in an integrative data analysis (IDA) requires researchers to make a number of decisions about how best to harmonize item responses across datasets. This entails two sets of steps: logical harmonization, which involves combining items which appear similar across datasets, and analytic harmonization, which involves using psychometric models to find and account for cross-study differences in measurement. Embedded in logical and analytic harmonization are many decisions, from deciding whether items can be combined prima facie to how best to find covariate effects on specific items. Researchers may not have specific hypotheses about these decisions, and each individual choice may seem arbitrary, but the cumulative effects of these decisions are unknown. In the current study, we conducted an IDA of the relationship between alcohol use and delinquency using three datasets (total N = 2245). For analytic harmonization, we used moderated nonlinear factor analysis (MNLFA) to generate factor scores for delinquency. We conducted both logical and analytic harmonization 72 times, each time making a different set of decisions. We assessed the cumulative influence of these decisions on MNLFA parameter estimates, factor scores, and estimates of the relationship between delinquency and alcohol use. There were differences across paths in MNLFA parameter estimates, but fewer differences in estimates of factor scores and regression parameters linking delinquency to alcohol use. These results suggest that factor scores may be relatively robust to subtly different decisions in data harmonization, and measurement model parameters are less so.
Pioneering a multi-phase framework to harmonize self-reported sleep data across cohorts
Abstract Study Objectives Harmonizing and aggregating data across studies enables pooled analyses that support external validation and enhance replicability and generalizability. However, the multidimensional nature of sleep poses challenges for data harmonization and aggregation. Here we describe and implement our process for harmonizing self-reported sleep data. Methods We established a multi-phase framework to harmonize self-reported sleep data: (1) compile items, (2) group items into domains, (3) harmonize items, and (4) evaluate harmonizability. We applied this process to produce a pooled multi-cohort sample of five US cohorts plus a separate yet fully harmonized sample from Rotterdam, Netherlands. Sleep and sociodemographic data are described and compared to demonstrate the utility of harmonization and aggregation. Results We collected 190 unique self-reported sleep items and grouped them into 15 conceptual domains. Using these domains as guiderails, we developed 14 harmonized items measuring aspects of satisfaction, alertness/sleepiness, timing, efficiency, duration, insomnia, and sleep apnea. External raters determined that 13 of these 14 items had moderate-to-high harmonizability. Alertness/Sleepiness items had lower harmonizability, while continuous, quantitative items (e.g. timing, total sleep time, and efficiency) had higher harmonizability. Descriptive statistics identified features that are more consistent (e.g. wake-up time and duration) and more heterogeneous (e.g. time in bed and bedtime) across samples. Conclusions Our process can guide researchers and cohort stewards toward effective sleep harmonization and provide a foundation for further methodological development in this expanding field. Broader national and international initiatives promoting common data elements across cohorts are needed to enhance future harmonization and aggregation efforts.
scAlign: a tool for alignment, integration, and rare cell identification from scRNA-seq data
scRNA-seq dataset integration occurs in different contexts, such as the identification of cell type-specific differences in gene expression across conditions or species, or batch effect correction. We present scAlign, an unsupervised deep learning method for data integration that can incorporate partial, overlapping, or a complete set of cell labels, and estimate per-cell differences in gene expression across datasets. scAlign performance is state-of-the-art and robust to cross-dataset variation in cell type-specific expression and cell type composition. We demonstrate that scAlign reveals gene expression programs for rare populations of malaria parasites. Our framework is widely applicable to integration challenges in other domains.
A comparison of methods to harmonize cortical thickness measurements across scanners and sites
Results of neuroimaging datasets aggregated from multiple sites may be biased by site-specific profiles in participants’ demographic and clinical characteristics, as well as MRI acquisition protocols and scanning platforms. We compared the impact of four different harmonization methods on results obtained from analyses of cortical thickness data: (1) linear mixed-effects model (LME) that models site-specific random intercepts (LMEINT), (2) LME that models both site-specific random intercepts and age-related random slopes (LMEINT+SLP), (3) ComBat, and (4) ComBat with a generalized additive model (ComBat-GAM). Our test case for comparing harmonization methods was cortical thickness data aggregated from 29 sites, which included 1,340 cases with posttraumatic stress disorder (PTSD) (6.2–81.8 years old) and 2,057 trauma-exposed controls without PTSD (6.3–85.2 years old). We found that, compared to the other data harmonization methods, data processed with ComBat-GAM was more sensitive to the detection of significant case-control differences (Χ2(3) = 63.704, p < 0.001) as well as case-control differences in age-related cortical thinning (Χ2(3) = 12.082, p = 0.007). Both ComBat and ComBat-GAM outperformed LME methods in detecting sex differences (Χ2(3) = 9.114, p = 0.028) in regional cortical thickness. ComBat-GAM also led to stronger estimates of age-related declines in cortical thickness (corrected p-values < 0.001), stronger estimates of case-related cortical thickness reduction (corrected p-values < 0.001), weaker estimates of age-related declines in cortical thickness in cases than controls (corrected p-values < 0.001), stronger estimates of cortical thickness reduction in females than males (corrected p-values < 0.001), and stronger estimates of cortical thickness reduction in females relative to males in cases than controls (corrected p-values < 0.001). Our results support the use of ComBat-GAM to minimize confounds and increase statistical power when harmonizing data with non-linear effects, and the use of either ComBat or ComBat-GAM for harmonizing data with linear effects.
Long time series (1984–2020) of albedo variations on the Greenland ice sheet from harmonized Landsat and Sentinel 2 imagery
Albedo is a key factor in modulating the absorption of solar radiation on ice surfaces. Satellite measurements have shown a general reduction in albedo across the Greenland ice sheet over the past few decades, particularly along the western margin of the ice sheet, a region known as the Dark Zone (albedo < 0.45). Here we chose a combination of Landsat 4–8 and Sentinel 2 imagery to enable us to derive the longest record of albedo variations in the Dark Zone, running from 1984 to 2020. We developed a simple, pragmatic and efficient sensor transformation to provide a long time series of consistent, harmonized satellite imagery. Narrow to broadband conversion algorithms were developed from regression models of harmonized satellite data and in situ albedo from the Program for Monitoring of the Greenland Ice Sheet (PROMICE) automatic weather stations. The albedo derived from the harmonized Landsat and Sentinel 2 data shows that the maximum extent of the Dark Zone expanded rapidly between 2005 and 2007, increasing to ~280% of the average annual maximum extent of 2900 km2 to ~8000 km2 since. The Dark Zone is continuing to darken slowly, with the average annual minimum albedo decreasing at a rate of $\\sim \\!-0.0006 \\pm 0.0004 \\, {\\rm a}^{-1}$ (p = 0.16, 2001–2020).
Effects of outliers on remote sensing‐assisted forest biomass estimation: A case study from the United States national forest inventory
Large‐scale ecological sampling networks, such as national forest inventories (NFIs), collect in situ data to support biodiversity monitoring, forest management and planning, and greenhouse gas reporting. Data harmonization aims to link auxiliary remotely sensed data to field‐collected data to expand beyond field sampling plots, but outliers that arise in data harmonization—questionable observations because their values differ substantially from the rest—are rarely addressed. In this paper, we review the sources of commonly occurring outliers, including random chance (statistical outliers), definitions and protocols set by sampling networks, and temporal and spatial mismatch between field‐collected and remotely sensed data. We illustrate different types of outliers and the effects they have on estimates of above‐ground biomass population parameters using a case study of 292 NFI plots paired with airborne laser scanning (ALS) and Sentinel‐2 data from Sawyer County, Wisconsin, United States. Depending on the criteria used to identify outliers (sampling year, plot location error, nonresponse, presence of zeros and model residuals), as many as 53 of the 292 Forest Inventory and Analysis plot observations (18%) were identified as potential outliers using a single criterion and 111 plot observations (38%) if all criteria were used. Inclusion or removal of potential outliers led to substantial differences in estimates of mean and standard error of the estimate of biomass per unit area. The simple expansion estimator, which does not rely on ALS or other auxiliary data, was more sensitive to outliers than model‐assisted approaches that incorporated ALS and Sentinel‐2 data. Including Sentinel‐2 predictors showed minimal increases to the precision of our estimates relative to models with ALS predictors alone. Outliers arise from many causes and can be pervasive in data harmonization workflows. Our review and case study serve as a note of caution to researchers and practitioners that the inclusion or removal of potential outliers can have unintended consequences on population parameter estimates. When used to inform large‐scale biomass mapping, carbon markets, greenhouse gas reporting and environmental policy, it is necessary to ensure the proper use of NFI and remotely sensed data in geospatial data harmonization.