Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
181
result(s) for
"database harmonization"
Sort by:
Tailoring the Nutritional Composition of Italian Foods to the US Nutrition5k Dataset for Food Image Recognition: Challenges and a Comparative Analysis
2024
Background: Training of machine learning algorithms on dish images collected in other countries requires possible sources of systematic discrepancies, including country-specific food composition databases (FCDBs), to be tackled. The US Nutrition5k project provides for ~5000 dish images and related dish- and ingredient-level information on mass, energy, and macronutrients from the US FCDB. The aim of this study is to (1) identify challenges/solutions in linking the nutritional composition of Italian foods with food images from Nutrition5k and (2) assess potential differences in nutrient content estimated across the Italian and US FCDBs and their determinants. Methods: After food matching, expert data curation, and handling of missing values, dish-level ingredients from Nutrition5k were integrated with the Italian-FCDB-specific nutritional composition (86 components); dish-specific nutrient content was calculated by summing the corresponding ingredient-specific nutritional values. Measures of agreement/difference were calculated between Italian- and US-FCDB-specific content of energy and macronutrients. Potential determinants of identified differences were investigated with multiple robust regression models. Results: Dishes showed a median mass of 145 g and included three ingredients in median. Energy, proteins, fats, and carbohydrates showed moderate-to-strong agreement between Italian- and US-FCDB-specific content; carbohydrates showed the worst performance, with the Italian FCDB providing smaller median values (median raw difference between the Italian and US FCDBs: −2.10 g). Regression models on dishes suggested a role for mass, number of ingredients, and presence of recreated recipes, alone or jointly with differential use of raw/cooked ingredients across the two FCDBs. Conclusions: In the era of machine learning approaches for food image recognition, manual data curation in the alignment of FCDBs is worth the effort.
Journal Article
md_(h)armonize: A Python Package for Atom-Level Harmonization of Public Metabolic Databases
2023
A major challenge to integrating public metabolic resources is the use of different nomenclatures by individual databases. This paper presents md_harmonize, an open-source Python package for harmonizing compounds and metabolic reactions across various metabolic databases. The md_harmonize package utilizes a neighborhood-specific graph coloring method for generating a unique identifier for each compound via atom identifiers based on a compound’s chemical structure. The resulting harmonized compounds and reactions can be used for various downstream analyses, including the construction of atom-resolved metabolic networks and models for metabolic flux analysis. Parts of the md_harmonize package have been optimized using a variety of computational techniques to allow certain NP-complete problems handled by the software to be tractable for these specific use-cases. The software is available on GitHub and through the Python Package Index, with end-user documentation hosted on GitHub Pages.
Journal Article
Atom Identifiers Generated by a Neighborhood-Specific Graph Coloring Method Enable Compound Harmonization across Metabolic Databases
by
Jin, Huan
,
Moseley, Hunter N. B.
,
Mitchell, Joshua M.
in
atom identifier
,
atom-resolved metabolic network
,
compound identifier
2020
Metabolic flux analysis requires both a reliable metabolic model and reliable metabolic profiles in characterizing metabolic reprogramming. Advances in analytic methodologies enable production of high-quality metabolomics datasets capturing isotopic flux. However, useful metabolic models can be difficult to derive due to the lack of relatively complete atom-resolved metabolic networks for a variety of organisms, including human. Here, we developed a neighborhood-specific graph coloring method that creates unique identifiers for each atom in a compound facilitating construction of an atom-resolved metabolic network. What is more, this method is guaranteed to generate the same identifier for symmetric atoms, enabling automatic identification of possible additional mappings caused by molecular symmetry. Furthermore, a compound coloring identifier derived from the corresponding atom coloring identifiers can be used for compound harmonization across various metabolic network databases, which is an essential first step in network integration. With the compound coloring identifiers, 8865 correspondences between KEGG (Kyoto Encyclopedia of Genes and Genomes) and MetaCyc compounds are detected, with 5451 of them confirmed by other identifiers provided by the two databases. In addition, we found that the Enzyme Commission numbers (EC) of reactions can be used to validate possible correspondence pairs, with 1848 unconfirmed pairs validated by commonality in reaction ECs. Moreover, we were able to detect various issues and errors with compound representation in KEGG and MetaCyc databases by compound coloring identifiers, demonstrating the usefulness of this methodology for database curation.
Journal Article
md_harmonize: A Python Package for Atom-Level Harmonization of Public Metabolic Databases
2023
A major challenge to integrating public metabolic resources is the use of different nomenclatures by individual databases. This paper presents md_harmonize, an open-source Python package for harmonizing compounds and metabolic reactions across various metabolic databases. The md_harmonize package utilizes a neighborhood-specific graph coloring method for generating a unique identifier for each compound via atom identifiers based on a compound’s chemical structure. The resulting harmonized compounds and reactions can be used for various downstream analyses, including the construction of atom-resolved metabolic networks and models for metabolic flux analysis. Parts of the md_harmonize package have been optimized using a variety of computational techniques to allow certain NP-complete problems handled by the software to be tractable for these specific use-cases. The software is available on GitHub and through the Python Package Index, with end-user documentation hosted on GitHub Pages.
Journal Article
taxalogue: a toolkit to create comprehensive CO1 reference databases
by
Noll, Niklas W.
,
Scherber, Christoph
,
Schäffler, Livia
in
Bar codes
,
Databases, Factual
,
DNA - genetics
2023
Taxonomic identification through DNA barcodes gained considerable traction through the invention of next-generation sequencing and DNA metabarcoding. Metabarcoding allows for the simultaneous identification of thousands of organisms from bulk samples with high taxonomic resolution. However, reliable identifications can only be achieved with comprehensive and curated reference databases. Therefore, custom reference databases are often created to meet the needs of specific research questions. Due to taxonomic inconsistencies, formatting issues, and technical difficulties, building a custom reference database requires tremendous effort. Here, we present
, an easy-to-use software for creating comprehensive and customized reference databases that provide clean and taxonomically harmonized records. In combination with extensive geographical filtering options,
opens up new possibilities for generating and testing evolutionary hypotheses.
collects DNA sequences from several online sources and combines them into a reference database. Taxonomic incongruencies between the different data sources can be harmonized according to available taxonomies. Dereplication and various filtering options are available regarding sequence quality or metadata information.
is implemented in the open-source Ruby programming language, and the source code is available at https://github.com/nwnoll/taxalogue. We benchmark four reference databases by sequence identity against eight queries from different localities and trapping devices. Subsamples from each reference database were used to compare how well another one is covered.
produces reference databases with the best coverage at high identities for most tested queries, enabling more accurate, reliable predictions with higher certainty than the other benchmarked reference databases. Additionally, the performance of
is more consistent while providing good coverage for a variety of habitats, regions, and sampling methods.
simplifies the creation of reference databases and makes the process reproducible and transparent. Multiple available output formats for commonly used downstream applications facilitate the easy adoption of
in many different software pipelines. The resulting reference databases improve the taxonomic classification accuracy through high coverage of the query sequences at high identities.
Journal Article
Global trends in income inequality and income dynamics: New insights from GRID
by
Guvenen, Fatih
,
Pistaferri, Luigi
,
Violante, Giovanni L
in
Administrative data
,
cross-country
,
Econometrics
2022
The Global Repository of Income Dynamics (GRID) is a new open-access, cross- country database that contains a wide range of micro statistics on income in- equality, dynamics, and mobility. It has four key characteristics: it is built on micro panel data drawn from administrative records; it fully exploits the longitudinal dimension of the underlying data sets; it offers granular descriptions of income inequality and income dynamics for finely defined subpopulations; and it is de- signed from the ground up with the goals of harmonization and cross-country comparability. This paper introduces the database and presents a set of global trends in income inequality and income dynamics across the 13 countries that are currently in GRID. Our results are based on the statistics created for GRID by the 13 country teams who also contributed to this special issue with individual articles.
Journal Article
The EurOPDX Data Portal: an open platform for patient-derived cancer xenograft data sharing and visualization
by
Begley, Dale A.
,
Thorne, Ross
,
Follette, Alex
in
Animal Genetics and Genomics
,
Animals
,
Biomedical and Life Sciences
2022
Background
Patient-derived xenografts (PDX) mice models play an important role in preclinical trials and personalized medicine. Sharing data on the models is highly valuable for numerous reasons – ethical, economical, research cross validation etc. The EurOPDX Consortium was established 8 years ago to share such information and avoid duplicating efforts in developing new PDX mice models and unify approaches to support preclinical research.
EurOPDX Data Portal
is the unified data sharing platform adopted by the Consortium.
Main body
In this paper we describe the main features of the EurOPDX Data Portal (
https://dataportal.europdx.eu/
), its architecture and possible utilization by researchers who look for PDX mice models for their research. The Portal offers a catalogue of European models accessible on a cooperative basis. The models are searchable by metadata, and a detailed view provides molecular profiles (gene expression, mutation, copy number alteration) and treatment studies. The Portal displays the data in multiple tools (PDX Finder, cBioPortal, and GenomeCruzer in future), which are populated from a common database displaying strictly mutually consistent views.
(Short) Conclusion
EurOPDX Data Portal is an entry point to the EurOPDX Research Infrastructure offering PDX mice models for collaborative research, (meta)data describing their features and deep molecular data analysis according to users’ interests.
Journal Article
Building a Digital Health Research Platform to Enable Recruitment, Enrollment, Data Collection, and Follow-Up for a Highly Diverse Longitudinal US Cohort of 1 Million People in the All of Us Research Program: Design and Implementation Study
by
Sawyer, Sherilyn
,
Montgomery, Aisha
,
Palmer, Marcy
in
Best practice
,
Biomedical Research
,
Biomedicine
2025
Longitudinal cohort studies have traditionally relied on clinic-based recruitment models, which limit cohort diversity and the generalizability of research outcomes. Digital research platforms can be used to increase participant access, improve study engagement, streamline data collection, and increase data quality; however, the efficacy and sustainability of digitally enabled studies rely heavily on the design, implementation, and management of the digital platform being used.
We sought to design and build a secure, privacy-preserving, validated, participant-centric digital health research platform (DHRP) to recruit and enroll participants, collect multimodal data, and engage participants from diverse backgrounds in the National Institutes of Health's (NIH) All of Us Research Program (AOU). AOU is an ongoing national, multiyear study aimed to build a research cohort of 1 million participants that reflects the diversity of the United States, including minority, health-disparate, and other populations underrepresented in biomedical research (UBR).
We collaborated with community members, health care provider organizations (HPOs), and NIH leadership to design, build, and validate a secure, feature-rich digital platform to facilitate multisite, hybrid, and remote study participation and multimodal data collection in AOU. Participants were recruited by in-person, print, and online digital campaigns. Participants securely accessed the DHRP via web and mobile apps, either independently or with research staff support. The participant-facing tool facilitated electronic informed consent (eConsent), multisource data collection (eg, surveys, genomic results, wearables, and electronic health records [EHRs]), and ongoing participant engagement. We also built tools for research staff to conduct remote participant support, study workflow management, participant tracking, data analytics, data harmonization, and data management.
We built a secure, participant-centric DHRP with engaging functionality used to recruit, engage, and collect data from 705,719 diverse participants throughout the United States. As of April 2024, 87% (n=613,976) of the participants enrolled via the platform were from UBR groups, including racial and ethnic minorities (n=282,429, 46%), rural dwelling individuals (n=49,118, 8%), those over the age of 65 years (n=190,333, 31%), and individuals with low socioeconomic status (n=122,795, 20%).
We built a participant-centric digital platform with tools to enable engagement with individuals from different racial, ethnic, and socioeconomic backgrounds and other UBR groups. This DHRP demonstrated successful use among diverse participants. These findings could be used as best practices for the effective use of digital platforms to build and sustain cohorts of various study designs and increase engagement with diverse populations in health research.
Journal Article
MINDMAP: establishing an integrated database infrastructure for research in ageing, mental well-being, and the urban environment
2018
Background
Urbanization and ageing have important implications for public mental health and well-being. Cities pose major challenges for older citizens, but also offer opportunities to develop, test, and implement policies, services, infrastructure, and interventions that promote mental well-being. The MINDMAP project aims to identify the opportunities and challenges posed by urban environmental characteristics for the promotion and management of mental well-being and cognitive function of older individuals.
Methods
MINDMAP aims to achieve its research objectives by bringing together longitudinal studies from 11 countries covering over 35 cities linked to databases of area-level environmental exposures and social and urban policy indicators. The infrastructure supporting integration of this data will allow multiple MINDMAP investigators to safely and remotely co-analyse individual-level and area-level data.
Individual-level data is derived from baseline and follow-up measurements of ten participating cohort studies and provides information on mental well-being outcomes, sociodemographic variables, health behaviour characteristics, social factors, measures of frailty, physical function indicators, and chronic conditions, as well as blood derived clinical biochemistry-based biomarkers and genetic biomarkers. Area-level information on physical environment characteristics (e.g. green spaces, transportation), socioeconomic and sociodemographic characteristics (e.g. neighbourhood income, residential segregation, residential density), and social environment characteristics (e.g. social cohesion, criminality) and national and urban social policies is derived from publically available sources such as geoportals and administrative databases.
The linkage, harmonization, and analysis of data from different sources are being carried out using piloted tools to optimize the validity of the research results and transparency of the methodology.
Discussion
MINDMAP is a novel research collaboration that is combining population-based cohort data with publicly available datasets not typically used for ageing and mental well-being research. Integration of various data sources and observational units into a single platform will help to explain the differences in ageing-related mental and cognitive disorders both within as well as between cities in Europe, the US, Canada, and Russia and to assess the causal pathways and interactions between the urban environment and the individual determinants of mental well-being and cognitive ageing in older adults.
Journal Article
Approximation of the soil particle-size distribution curve using a NURBS curve
by
Marhoul, Adéla Marie
,
Herza, Tomáš
,
Kozák, Josef
in
Approximation
,
B spline functions
,
Classification systems
2025
Soil particle-size distribution or soil texture presents one of the most important physical properties. There are various systems of the classification systems for soil particle-size fractions with different boundaries. Our effort was concentrated on the mathematical approach to evaluate the existing data and convert it to the form of a reconstructed cumulative particle-size curve which will allow reading concentration of any desired particle size. Non-Uniform Rational B-Splines (NURBS) curves therefore represent a generalization of B-splines and Bézier curves by extending the definition by an element of rationality, which is represented by the weights of the control points, and a nodal vector of parametrization, which represents the element of uniformity. The NURBS curve was used for smooth (depending on the degree of the curve used) and as tight as possible approximation of the arranged control points, the connecting lines of which forms a convex envelope for its individual parts. The NURBS approximation curve is therefore determined by the ordered control points and their connecting lines, the weights of these points, the degree of the curve and the nodal vector of parametrization. However, the construction of the approximation curve is primarily dependent on a limited number of points of the experimentally determined particle-size distribution curves, and for curves with significant breaks in the course, one must consider either a lower accuracy of the approximation or the necessity of “improving” the approximation using the weights of individual points, inserting additional points or working with a nodal vector of parametrization. For basic approximation, the PUGIS system (Czech soil information system) offers automatic approximation using all variants mentioned in the article as well as the possibility of individual changes in the weights of control points, in their number and position, and in the nodal vector of parametrization.
Journal Article