Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
825
result(s) for
"Data harmonization"
Sort by:
The quest for seafloor macrolitter: a critical review of background knowledge, current methods and future prospects
by
Giorgetti, Alessandra
,
van Sebille, Erik
,
Bergmann, Melanie
in
[SDV]Life Sciences [q-bio]
,
Data harmonisation
,
data harmonisation; deep sea; marine litter; modelling; seafloor; trawl surveys; visual surveys
2021
The seafloor covers some 70% of the Earth’s surface and has been recognised as a major sink for marine litter. Still, litter on the seafloor is the least investigated fraction of marine litter, which is not surprising as most of it lies in the deep sea, i.e. the least explored ecosystem. Although marine litter is considered a major threat for the oceans, monitoring frameworks are still being set up. This paper reviews current knowledge and methods, identifies existing needs, and points to future developments that are required to address the estimation of seafloor macrolitter. It provides background knowledge and conveys the views and thoughts of scientific experts on seafloor marine litter offering a review of monitoring and ocean modelling techniques. Knowledge gaps that need to be tackled, data needs for modelling, and data comparability and harmonisation are also discussed. In addition, it shows how research on seafloor macrolitter can inform international protection and conservation frameworks to prioritise efforts and measures against marine litter and its deleterious impacts.
Journal Article
How European Research Projects Can Support Vaccination Strategies: The Case of the ORCHESTRA Project for SARS-CoV-2
by
Caroline Stellmach
,
Maddalena Giannella
,
Surbhi Malhotra-Kumar
in
Antibodies
,
Cohort analysis
,
cohort study
2023
ORCHESTRA (“Connecting European Cohorts to Increase Common and Effective Response To SARS-CoV-2 Pandemic”) is an EU-funded project which aims to help rapidly advance the knowledge related to the prevention of the SARS-CoV-2 infection and the management of COVID-19 and its long-term sequelae. Here, we describe the early results of this project, focusing on the strengths of multiple, international, historical and prospective cohort studies and highlighting those results which are of potential relevance for vaccination strategies, such as the necessity of a vaccine booster dose after a primary vaccination course in hematologic cancer patients and in solid organ transplant recipients to elicit a higher antibody titer, and the protective effect of vaccination on severe COVID-19 clinical manifestation and on the emergence of post-COVID-19 conditions. Valuable data regarding epidemiological variations, risk factors of SARS-CoV-2 infection and its sequelae, and vaccination efficacy in different subpopulations can support further defining public health vaccination policies.
Journal Article
Informing Harmonization Decisions in Integrative Data Analysis: Exploring the Measurement Multiverse
2023
Combining datasets in an integrative data analysis (IDA) requires researchers to make a number of decisions about how best to harmonize item responses across datasets. This entails two sets of steps: logical harmonization, which involves combining items which appear similar across datasets, and analytic harmonization, which involves using psychometric models to find and account for cross-study differences in measurement. Embedded in logical and analytic harmonization are many decisions, from deciding whether items can be combined prima facie to how best to find covariate effects on specific items. Researchers may not have specific hypotheses about these decisions, and each individual choice may seem arbitrary, but the cumulative effects of these decisions are unknown. In the current study, we conducted an IDA of the relationship between alcohol use and delinquency using three datasets (total N = 2245). For analytic harmonization, we used moderated nonlinear factor analysis (MNLFA) to generate factor scores for delinquency. We conducted both logical and analytic harmonization 72 times, each time making a different set of decisions. We assessed the cumulative influence of these decisions on MNLFA parameter estimates, factor scores, and estimates of the relationship between delinquency and alcohol use. There were differences across paths in MNLFA parameter estimates, but fewer differences in estimates of factor scores and regression parameters linking delinquency to alcohol use. These results suggest that factor scores may be relatively robust to subtly different decisions in data harmonization, and measurement model parameters are less so.
Journal Article
Breaking Digital Health Barriers Through a Large Language Model–Based Tool for Automated Observational Medical Outcomes Partnership Mapping: Development and Validation Study
2025
The integration of diverse clinical data sources requires standardization through models such as Observational Medical Outcomes Partnership (OMOP). However, mapping data elements to OMOP concepts demands significant technical expertise and time. While large health care systems often have resources for OMOP conversion, smaller clinical trials and studies frequently lack such support, leaving valuable research data siloed.
This study aims to develop and validate a user-friendly tool that leverages large language models to automate the OMOP conversion process for clinical trials, electronic health records, and registry data.
We developed a 3-tiered semantic matching system using GPT-3 embeddings to transform heterogeneous clinical data to the OMOP Common Data Model. The system processes input terms by generating vector embeddings, computing cosine similarity against precomputed Observational Health Data Sciences and Informatics vocabulary embeddings, and ranking potential matches. We validated the system using two independent datasets: (1) a development set of 76 National Institutes of Health Helping to End Addiction Long-term Initiative clinical trial common data elements for chronic pain and opioid use disorders and (2) a separate validation set of electronic health record concepts from the National Institutes of Health National COVID Cohort Collaborative COVID-19 enclave. The architecture combines Unified Medical Language System semantic frameworks with asynchronous processing for efficient concept mapping, made available through an open-source implementation.
The system achieved an area under the receiver operating characteristic curve of 0.9975 for mapping clinical trial common data element terms. Precision ranged from 0.92 to 0.99 and recall ranged from 0.88 to 0.97 across similarity thresholds from 0.85 to 1.0. In practical application, the tool successfully automated mappings that previously required manual informatics expertise, reducing the technical barriers for research teams to participate in large-scale, data-sharing initiatives. Representative mappings demonstrated high accuracy, such as demographic terms achieving 100% similarity with corresponding Logical Observation Identifiers Names and Codes concepts. The implementation successfully processes diverse data types through both individual term mapping and batch processing capabilities.
Our validated large language model-based tool effectively automates the transformation of clinical data into the OMOP format while maintaining high accuracy. The combination of semantic matching capabilities and a researcher-friendly interface makes data harmonization accessible to smaller research teams without requiring extensive informatics support. This has direct implications for accelerating clinical research data standardization and enabling broader participation in initiatives such as the National Institutes of Health Helping to End Addiction Long-term Initiative Data Ecosystem.
Journal Article
Estimating prevalence of subjective cognitive decline in and across international cohort studies of aging: a COSMIC study
by
Ritchie, Karen
,
Chen, Sanmei
,
Snitz, Beth E.
in
Aging
,
Alzheimer's disease
,
Biomedical and Life Sciences
2020
Background
Subjective cognitive decline (SCD) is recognized as a risk stage for Alzheimer’s disease (AD) and other dementias, but its prevalence is not well known. We aimed to use uniform criteria to better estimate SCD prevalence across international cohorts.
Methods
We combined individual participant data for 16 cohorts from 15 countries (members of the COSMIC consortium) and used qualitative and quantitative (Item Response Theory/IRT) harmonization techniques to estimate SCD prevalence.
Results
The sample comprised 39,387 cognitively unimpaired individuals above age 60. The prevalence of SCD across studies was around one quarter with both qualitative harmonization/QH (23.8%, 95%CI = 23.3–24.4%) and IRT (25.6%, 95%CI = 25.1–26.1%); however, prevalence estimates varied largely between studies (QH 6.1%, 95%CI = 5.1–7.0%, to 52.7%, 95%CI = 47.4–58.0%; IRT: 7.8%, 95%CI = 6.8–8.9%, to 52.7%, 95%CI = 47.4–58.0%). Across studies, SCD prevalence was higher in men than women, in lower levels of education, in Asian and Black African people compared to White people, in lower- and middle-income countries compared to high-income countries, and in studies conducted in later decades.
Conclusions
SCD is frequent in old age. Having a quarter of older individuals with SCD warrants further investigation of its significance, as a risk stage for AD and other dementias, and of ways to help individuals with SCD who seek medical advice. Moreover, a standardized instrument to measure SCD is needed to overcome the measurement variability currently dominant in the field.
Journal Article
Pioneering a multi-phase framework to harmonize self-reported sleep data across cohorts
2024
Abstract
Study Objectives
Harmonizing and aggregating data across studies enables pooled analyses that support external validation and enhance replicability and generalizability. However, the multidimensional nature of sleep poses challenges for data harmonization and aggregation. Here we describe and implement our process for harmonizing self-reported sleep data.
Methods
We established a multi-phase framework to harmonize self-reported sleep data: (1) compile items, (2) group items into domains, (3) harmonize items, and (4) evaluate harmonizability. We applied this process to produce a pooled multi-cohort sample of five US cohorts plus a separate yet fully harmonized sample from Rotterdam, Netherlands. Sleep and sociodemographic data are described and compared to demonstrate the utility of harmonization and aggregation.
Results
We collected 190 unique self-reported sleep items and grouped them into 15 conceptual domains. Using these domains as guiderails, we developed 14 harmonized items measuring aspects of satisfaction, alertness/sleepiness, timing, efficiency, duration, insomnia, and sleep apnea. External raters determined that 13 of these 14 items had moderate-to-high harmonizability. Alertness/Sleepiness items had lower harmonizability, while continuous, quantitative items (e.g. timing, total sleep time, and efficiency) had higher harmonizability. Descriptive statistics identified features that are more consistent (e.g. wake-up time and duration) and more heterogeneous (e.g. time in bed and bedtime) across samples.
Conclusions
Our process can guide researchers and cohort stewards toward effective sleep harmonization and provide a foundation for further methodological development in this expanding field. Broader national and international initiatives promoting common data elements across cohorts are needed to enhance future harmonization and aggregation efforts.
Journal Article
scAlign: a tool for alignment, integration, and rare cell identification from scRNA-seq data
2019
scRNA-seq dataset integration occurs in different contexts, such as the identification of cell type-specific differences in gene expression across conditions or species, or batch effect correction. We present scAlign, an unsupervised deep learning method for data integration that can incorporate partial, overlapping, or a complete set of cell labels, and estimate per-cell differences in gene expression across datasets. scAlign performance is state-of-the-art and robust to cross-dataset variation in cell type-specific expression and cell type composition. We demonstrate that scAlign reveals gene expression programs for rare populations of malaria parasites. Our framework is widely applicable to integration challenges in other domains.
Journal Article
A comparison of methods to harmonize cortical thickness measurements across scanners and sites
2022
Results of neuroimaging datasets aggregated from multiple sites may be biased by site-specific profiles in participants’ demographic and clinical characteristics, as well as MRI acquisition protocols and scanning platforms. We compared the impact of four different harmonization methods on results obtained from analyses of cortical thickness data: (1) linear mixed-effects model (LME) that models site-specific random intercepts (LMEINT), (2) LME that models both site-specific random intercepts and age-related random slopes (LMEINT+SLP), (3) ComBat, and (4) ComBat with a generalized additive model (ComBat-GAM). Our test case for comparing harmonization methods was cortical thickness data aggregated from 29 sites, which included 1,340 cases with posttraumatic stress disorder (PTSD) (6.2–81.8 years old) and 2,057 trauma-exposed controls without PTSD (6.3–85.2 years old). We found that, compared to the other data harmonization methods, data processed with ComBat-GAM was more sensitive to the detection of significant case-control differences (Χ2(3) = 63.704, p < 0.001) as well as case-control differences in age-related cortical thinning (Χ2(3) = 12.082, p = 0.007). Both ComBat and ComBat-GAM outperformed LME methods in detecting sex differences (Χ2(3) = 9.114, p = 0.028) in regional cortical thickness. ComBat-GAM also led to stronger estimates of age-related declines in cortical thickness (corrected p-values < 0.001), stronger estimates of case-related cortical thickness reduction (corrected p-values < 0.001), weaker estimates of age-related declines in cortical thickness in cases than controls (corrected p-values < 0.001), stronger estimates of cortical thickness reduction in females than males (corrected p-values < 0.001), and stronger estimates of cortical thickness reduction in females relative to males in cases than controls (corrected p-values < 0.001). Our results support the use of ComBat-GAM to minimize confounds and increase statistical power when harmonizing data with non-linear effects, and the use of either ComBat or ComBat-GAM for harmonizing data with linear effects.
Journal Article
Breaking Silos in Caregiving Research: Toward Unified Measures Across the Lifespan
2025
The number of caregivers is increasing globally making it imperative that we better understand the impact of caregiving and identify methods to address caregiver needs and health. These goals are best achieved with unified research approaches and measures that facilitate comparison across studies. Despite the need and policy support for unified research on caregiving, research often happens in silos that are diagnostic or age specific. To address this need we interviewed 33 researchers who (1) identified process, and outcome measures they commonly used in their research and (2) explained their selection. We found that researchers across the lifespan are using similar measures in their studies and are consistent in what they look for in a measure. Researchers also described barriers they face when selecting measures, including: inadequacy of current measures, familiarity, need for rigor, and measurement characteristics. These findings highlight the need for the creation and dissemination of a prioritized list of process and outcome measures being used by caregiving researchers.
Journal Article
Standardized Mean Differences: Not So Standard After All
2025
ABSTRACT
Meta‐analyses often use standardized mean differences (SMDs), such as Cohen's d and Hedges' g, to compare treatment effects. However, these SMDs are highly sensitive to the within‐study sample variability used for their standardization, potentially distorting individual effect size estimates and compromising overall meta‐analytic conclusions. This study introduces harmonized standardized mean differences (HSMDs), a novel sensitivity analysis framework designed to systematically evaluate and address such distortions. The HSMD harmonizes relative within‐study variability across studies by employing the coefficient of variation (CV) to establish empirical benchmarks (e.g., CV quartiles). SMDs are then recalculated under these consistent variability assumptions. Applying this framework to Meta‐analytic data reveals the extent to which (original) effect sizes and pooled results are influenced by initial, study‐specific standard deviations to standardize mean differences. Furthermore, the method facilitates the inclusion of studies lacking reported variability metrics into the sensitivity analysis, enhancing the comprehensiveness of the meta‐analytic synthesis.
Journal Article