Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
815
result(s) for
"Data harmonisation"
Sort by:
The quest for seafloor macrolitter: a critical review of background knowledge, current methods and future prospects
by
Giorgetti, Alessandra
,
van Sebille, Erik
,
Bergmann, Melanie
in
[SDV]Life Sciences [q-bio]
,
Data harmonisation
,
data harmonisation; deep sea; marine litter; modelling; seafloor; trawl surveys; visual surveys
2021
The seafloor covers some 70% of the Earth’s surface and has been recognised as a major sink for marine litter. Still, litter on the seafloor is the least investigated fraction of marine litter, which is not surprising as most of it lies in the deep sea, i.e. the least explored ecosystem. Although marine litter is considered a major threat for the oceans, monitoring frameworks are still being set up. This paper reviews current knowledge and methods, identifies existing needs, and points to future developments that are required to address the estimation of seafloor macrolitter. It provides background knowledge and conveys the views and thoughts of scientific experts on seafloor marine litter offering a review of monitoring and ocean modelling techniques. Knowledge gaps that need to be tackled, data needs for modelling, and data comparability and harmonisation are also discussed. In addition, it shows how research on seafloor macrolitter can inform international protection and conservation frameworks to prioritise efforts and measures against marine litter and its deleterious impacts.
Journal Article
How European Research Projects Can Support Vaccination Strategies: The Case of the ORCHESTRA Project for SARS-CoV-2
by
Caroline Stellmach
,
Maddalena Giannella
,
Surbhi Malhotra-Kumar
in
Antibodies
,
Cohort analysis
,
cohort study
2023
ORCHESTRA (“Connecting European Cohorts to Increase Common and Effective Response To SARS-CoV-2 Pandemic”) is an EU-funded project which aims to help rapidly advance the knowledge related to the prevention of the SARS-CoV-2 infection and the management of COVID-19 and its long-term sequelae. Here, we describe the early results of this project, focusing on the strengths of multiple, international, historical and prospective cohort studies and highlighting those results which are of potential relevance for vaccination strategies, such as the necessity of a vaccine booster dose after a primary vaccination course in hematologic cancer patients and in solid organ transplant recipients to elicit a higher antibody titer, and the protective effect of vaccination on severe COVID-19 clinical manifestation and on the emergence of post-COVID-19 conditions. Valuable data regarding epidemiological variations, risk factors of SARS-CoV-2 infection and its sequelae, and vaccination efficacy in different subpopulations can support further defining public health vaccination policies.
Journal Article
The impact of physical activity on healthy ageing trajectories: evidence from eight cohort studies
2020
Background
Research has suggested the positive impact of physical activity on health and wellbeing in older age, yet few studies have investigated the associations between physical activity and heterogeneous trajectories of healthy ageing. We aimed to identify how physical activity can influence healthy ageing trajectories using a harmonised dataset of eight ageing cohorts across the world.
Methods
Based on a harmonised dataset of eight ageing cohorts in Australia, USA, Mexico, Japan, South Korea, and Europe, comprising 130,521 older adults (
M
age
= 62.81,
SD
age
= 10.06) followed-up up to 10 years (
M
follow-up
= 5.47,
SD
follow-up
= 3.22)
,
we employed growth mixture modelling to identify latent classes of people with different trajectories of healthy ageing scores, which incorporated 41 items of health and functioning. Multinomial logistic regression modelling was used to investigate the associations between physical activity and different types of trajectories adjusting for sociodemographic characteristics and other lifestyle behaviours.
Results
Three latent classes of healthy ageing trajectories were identified: two with stable trajectories with high (71.4%) or low (25.2%) starting points and one with a high starting point but a fast decline over time (3.4%). Engagement in any level of physical activity was associated with decreased odds of being in the low stable (OR: 0.18; 95% CI: 0.17, 0.19) and fast decline trajectories groups (OR: 0.44; 95% CI: 0.39, 0.50) compared to the high stable trajectory group. These results were replicated with alternative physical activity operationalisations, as well as in sensitivity analyses using reduced samples.
Conclusions
Our findings suggest a positive impact of physical activity on healthy ageing, attenuating declines in health and functioning. Physical activity promotion should be a key focus of healthy ageing policies to prevent disability and fast deterioration in health.
Journal Article
The EU Child Cohort Network’s core data: establishing a set of findable, accessible, interoperable and re-usable (FAIR) variables
by
Pinot de Moira Angela
,
Eriksson, Johan G
,
Heude, Barbara
in
Accessibility
,
Collaboration
,
Exposure
2021
The Horizon2020 LifeCycle Project is a cross-cohort collaboration which brings together data from multiple birth cohorts from across Europe and Australia to facilitate studies on the influence of early-life exposures on later health outcomes. A major product of this collaboration has been the establishment of a FAIR (findable, accessible, interoperable and reusable) data resource known as the EU Child Cohort Network. Here we focus on the EU Child Cohort Network’s core variables. These are a set of basic variables, derivable by the majority of participating cohorts and frequently used as covariates or exposures in lifecourse research. First, we describe the process by which the list of core variables was established. Second, we explain the protocol according to which these variables were harmonised in order to make them interoperable. Third, we describe the catalogue developed to ensure that the network’s data are findable and reusable. Finally, we describe the core data, including the proportion of variables harmonised by each cohort and the number of children for whom harmonised core data are available. EU Child Cohort Network data will be analysed using a federated analysis platform, removing the need to physically transfer data and thus making the data more accessible to researchers. The network will add value to participating cohorts by increasing statistical power and exposure heterogeneity, as well as facilitating cross-cohort comparisons, cross-validation and replication. Our aim is to motivate other cohorts to join the network and encourage the use of the EU Child Cohort Network by the wider research community.
Journal Article
Using natural language processing to facilitate the harmonisation of mental health questionnaires: a validation study using real-world data
2024
Background
Pooling data from different sources will advance mental health research by providing larger sample sizes and allowing cross-study comparisons; however, the heterogeneity in how variables are measured across studies poses a challenge to this process.
Methods
This study explored the potential of using natural language processing (NLP) to harmonise different mental health questionnaires by matching individual questions based on their semantic content. Using the Sentence-BERT model, we calculated the semantic similarity (cosine index) between 741 pairs of questions from five questionnaires. Drawing on data from a representative UK sample of adults (
N
= 2,058), we calculated a Spearman rank correlation for each of the same pairs of items, and then estimated the correlation between the cosine values and Spearman coefficients. We also used network analysis to explore the model’s ability to uncover structures within the data and metadata.
Results
We found a moderate overall correlation (
r
= .48,
p
< .001) between the two indices. In a holdout sample, the cosine scores predicted the real-world correlations with a small degree of error (MAE = 0.05, MedAE = 0.04, RMSE = 0.064) suggesting the utility of NLP in identifying similar items for cross-study data pooling. Our NLP model could detect more complex patterns in our data, however it required manual rules to decide which edges to include in the network.
Conclusions
This research shows that it is possible to quantify the semantic similarity between pairs of questionnaire items from their meta-data, and these similarity indices correlate with how participants would answer the same two items. This highlights the potential of NLP to facilitate cross-study data pooling in mental health research. Nevertheless, researchers are cautioned to verify the psychometric equivalence of matched items.
Journal Article
Harmonisation of assessments of attention, social, emotional, and behaviour problems using the Child Behavior Checklist and the Strengths and Difficulties Questionnaire
by
Jaekel, Julia
,
Johnson, Samantha
,
Marlow, Neil
in
CBCL
,
data harmonisation
,
measurement invariance
2024
Objectives
Retrospective harmonisation of data obtained through different instruments creates measurement error, even if the underlying concepts are assumed the same. We tested a novel method for item‐level data harmonisation of two widely used instruments that measure emotional and behavioural problems: the Child Behavior Checklist (CBCL) and the Strengths and Difficulties Questionnaire (SDQ).
Methods
Item content of the CBCL and SDQ was mapped onto four dimensions: emotional problems, peer relationship problems, hyperactivity/inattention and conduct problems. A diverse test sample was drawn from four prospective longitudinal birth cohort studies in Australia and Europe who used one or both instruments. The pooled sample included 5188 data points assessing children and adolescents aged 6–13 years (N = 257–704 participants per cohort). Measurement invariance was assessed using latent variable multi‐group confirmatory factor analysis.
Results
Fifteen items from the CBCL and SDQ were mapped onto four dimensions allowing for measurement invariance testing as part of a stepwise process. Partial strict invariance between CBCL and SDQ assessments was established for all four dimensions.
Conclusions
The harmonised dimensions of emotional, peer relationship, hyperactivity/inattention and conduct problems are invariant across the CBCL and SDQ suggesting that these dimensions can be reliably compared with limited measurement error.
Journal Article
Pretrained language models for semantics-aware data harmonisation of observational clinical studies in the era of big data
by
Zlatev, Zlatko
,
Dylag, Jakub J.
,
Boniface, Michael
in
Analysis
,
Artificial intelligence
,
Automation
2025
Background
In clinical research, there is a strong drive to leverage big data from population cohort studies and routine electronic healthcare records to design new interventions, improve health outcomes and increase the efficiency of healthcare delivery. However, realising these potential demands requires substantial efforts in harmonising source datasets and curating study data, which currently relies on costly, time-consuming and labour-intensive methods. We explore and assess the use of natural language processing (NLP) and unsupervised machine learning (ML) to address the challenges of big data semantic harmonisation and curation.
Methods
Our aim is to establish an efficient and robust technological foundation for the development of automated tools supporting data curation of large clinical datasets. We propose two AI based pipelines for automated semantic harmonisation: a pipeline for semantics-aware search for domain relevant variables and a pipeline for clustering of semantically similar variables. We evaluate pipeline performance using 94,037 textual variable descriptions from the English Longitudinal Study of Ageing (ELSA) database.
Results
We observe high accuracy of our Semantic Search pipeline, with an AUC of 0.899 (SD = 0.056). Our semantic clustering pipeline achieves a V-measure of 0.237 (SD = 0.157), which is on par with that of leading implementations in other relevant domains. Automation can significantly accelerate the process of dataset harmonisation. Manual labelling was performed at a speed of 2.1 descriptions per minute, with our automated labelling increasing speed to 245 descriptions per minute.
Conclusions
Our study findings underscore the potential of AI technologies, such as NLP and unsupervised ML, in automating the harmonisation and curation of big data for clinical research. By establishing a robust technological foundation, we pave the way for the development of automated tools that streamline the process, enabling health data scientists to leverage big data more efficiently and effectively in their studies and accelerating insights from data for clinical benefit.
Journal Article
Evaluating the harmonisation potential of diverse cohort datasets
2023
Data discovery, the ability to find datasets relevant to an analysis, increases scientific opportunity, improves rigour and accelerates activity. Rapid growth in the depth, breadth, quantity and availability of data provides unprecedented opportunities and challenges for data discovery. A potential tool for increasing the efficiency of data discovery, particularly across multiple datasets is data harmonisation.A set of 124 variables, identified as being of broad interest to neurodegeneration, were harmonised using the C-Surv data model. Harmonisation strategies used were simple calibration, algorithmic transformation and standardisation to the Z-distribution. Widely used data conventions, optimised for inclusiveness rather than aetiological precision, were used as harmonisation rules. The harmonisation scheme was applied to data from four diverse population cohorts.Of the 120 variables that were found in the datasets, correspondence between the harmonised data schema and cohort-specific data models was complete or close for 111 (93%). For the remainder, harmonisation was possible with a marginal a loss of granularity.Although harmonisation is not an exact science, sufficient comparability across datasets was achieved to enable data discovery with relatively little loss of informativeness. This provides a basis for further work extending harmonisation to a larger variable list, applying the harmonisation to further datasets, and incentivising the development of data discovery tools.
Journal Article
Harmonising electronic health records for reproducible research: challenges, solutions and recommendations from a UK-wide COVID-19 research collaboration
by
Bedston, Stuart
,
Wood, Angela
,
Mizani, Mehrdad A.
in
Common data model
,
Consortia
,
COVID-19 - epidemiology
2023
Background
The CVD-COVID-UK consortium was formed to understand the relationship between COVID-19 and cardiovascular diseases through analyses of harmonised electronic health records (EHRs) across the four UK nations. Beyond COVID-19, data harmonisation and common approaches enable analysis within and across independent Trusted Research Environments. Here we describe the reproducible harmonisation method developed using large-scale EHRs in Wales to accommodate the fast and efficient implementation of cross-nation analysis in England and Wales as part of the CVD-COVID-UK programme. We characterise current challenges and share lessons learnt.
Methods
Serving the scope and scalability of multiple study protocols, we used linked, anonymised individual-level EHR, demographic and administrative data held within the SAIL Databank for the population of Wales. The harmonisation method was implemented as a four-layer reproducible process, starting from raw data in the first layer. Then each of the layers two to four is framed by, but not limited to, the characterised challenges and lessons learnt. We achieved curated data as part of our second layer, followed by extracting phenotyped data in the third layer. We captured any project-specific requirements in the fourth layer.
Results
Using the implemented four-layer harmonisation method, we retrieved approximately 100 health-related variables for the 3.2 million individuals in Wales, which are harmonised with corresponding variables for > 56 million individuals in England. We processed 13 data sources into the first layer of our harmonisation method: five of these are updated daily or weekly, and the rest at various frequencies providing sufficient data flow updates for frequent capturing of up-to-date demographic, administrative and clinical information.
Conclusions
We implemented an efficient, transparent, scalable, and reproducible harmonisation method that enables multi-nation collaborative research. With a current focus on COVID-19 and its relationship with cardiovascular outcomes, the harmonised data has supported a wide range of research activities across the UK.
Journal Article
Towards Sustainable Road Safety: Feature-Level Interpretation of Injury Severity in Poland (2015–2024) Using SHAP and XGBoost
2025
This study investigates the severity of injuries sustained by over seven million participants involved in road traffic incidents in Poland between 2015 and 2024, with a view to supporting sustainable mobility and the United Nations Sustainable Development Goals. Road safety is a crucial dimension of sustainable development, directly linked to public health, urban liveability, and the socio-economic costs of transportation systems. Using a harmonised participant-level dataset, this research identifies key demographic, behavioural, and environmental factors associated with injury outcomes. A novel five-level injury severity variable was developed by integrating inconsistent records on fatalities and injuries. Descriptive analyses revealed clear seasonal and weekly patterns, as well as substantial differences by participant type and driving licence status. Pedestrians and passengers faced the highest risk, with fatality rates more than five times higher than those of drivers. An XGBoost classifier was trained to predict injury severity, and SHAP analysis was applied to interpret the model’s outputs at the feature level. Participant role emerged as the most important predictor, followed by driving licence status, vehicle type, lighting conditions, and road geometry. These findings provide actionable insights for sustainable road safety interventions, including stronger protection for pedestrians and passengers, stricter enforcement against unlicensed driving, and infrastructural improvements such as better lighting and safer road design. By combining machine learning with interpretability tools, this study offers an analytical framework that can inform evidence-based policies aimed at reducing crash-related harm and advancing sustainable transport development.
Journal Article