Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
11,431
result(s) for
"Medical natural language processing"
Sort by:
Hybrid natural language processing tool for semantic annotation of medical texts in Spanish
by
Capllonch-Carrión, Adrián
,
Valverde-Mateos, Ana
,
Campillos-Llanos, Leonardo
in
Algorithms
,
Analysis
,
Annotations
2025
Background
Natural language processing (NLP) enables the extraction of information embedded within unstructured texts, such as clinical case reports and trial eligibility criteria. By identifying relevant medical concepts, NLP facilitates the generation of structured and actionable data, supporting complex tasks like cohort identification and the analysis of clinical records. To accomplish those tasks, we introduce a deep learning-based and lexicon-based named entity recognition (NER) tool for texts in Spanish. It performs medical NER and normalization, medication information extraction and detection of temporal entities, negation and speculation, and temporality or experiencer attributes (Age, Contraindicated, Negated, Speculated, Hypothetical, Future, Family_member, Patient and Other). We built the tool with a dedicated lexicon and rules adapted from NegEx and HeidelTime. Using these resources, we annotated a corpus of 1200 texts, with high inter-annotator agreement (average F1 = 0.841% ± 0.045 for entities, and average F1 = 0.881% ± 0.032 for attributes). We used this corpus to train Transformer-based models (RoBERTa-based models, mBERT and mDeBERTa). We integrated them with the dictionary-based system in a hybrid tool, and distribute the models via the Hugging Face hub. For an internal validation, we used a held-out test set and conducted an error analysis. For an external validation, eight medical professionals evaluated the system by revising the annotation of 200 new texts not used in development.
Results
In the internal validation, the models yielded F1 values up to 0.915. In the external validation with 100 clinical trials, the tool achieved an average F1 score of 0.858 (± 0.032); and in 100 anonymized clinical cases, it achieved an average F1 score of 0.910 (± 0.019).
Conclusions
The tool is available at
https://claramed.csic.es/medspaner
. We also release the code (
https://github.com/lcampillos/medspaner
) and the annotated corpus to train the models.
Journal Article
Diversity Learning Based on Multi-Latent Space for Medical Image Visual Question Generation
by
Togo, Ren
,
Ogawa, Takahiro
,
Zhu, He
in
Automation
,
Computational linguistics
,
computer vision
2023
Auxiliary clinical diagnosis has been researched to solve unevenly and insufficiently distributed clinical resources. However, auxiliary diagnosis is still dominated by human physicians, and how to make intelligent systems more involved in the diagnosis process is gradually becoming a concern. An interactive automated clinical diagnosis with a question-answering system and a question generation system can capture a patient’s conditions from multiple perspectives with less physician involvement by asking different questions to drive and guide the diagnosis. This clinical diagnosis process requires diverse information to evaluate a patient from different perspectives to obtain an accurate diagnosis. Recently proposed medical question generation systems have not considered diversity. Thus, we propose a diversity learning-based visual question generation model using a multi-latent space to generate informative question sets from medical images. The proposed method generates various questions by embedding visual and language information in different latent spaces, whose diversity is trained by our newly proposed loss. We have also added control over the categories of generated questions, making the generated questions directional. Furthermore, we use a new metric named similarity to accurately evaluate the proposed model’s performance. The experimental results on the Slake and VQA-RAD datasets demonstrate that the proposed method can generate questions with diverse information. Our model works with an answering model for interactive automated clinical diagnosis and generates datasets to replace the process of annotation that incurs huge labor costs.
Journal Article
The 2019 n2c2/OHNLP Track on Clinical Semantic Textual Similarity: Overview
2020
Semantic textual similarity is a common task in the general English domain to assess the degree to which the underlying semantics of 2 text segments are equivalent to each other. Clinical Semantic Textual Similarity (ClinicalSTS) is the semantic textual similarity task in the clinical domain that attempts to measure the degree of semantic equivalence between 2 snippets of clinical text. Due to the frequent use of templates in the Electronic Health Record system, a large amount of redundant text exists in clinical notes, making ClinicalSTS crucial for the secondary use of clinical text in downstream clinical natural language processing applications, such as clinical text summarization, clinical semantics extraction, and clinical information retrieval.
Our objective was to release ClinicalSTS data sets and to motivate natural language processing and biomedical informatics communities to tackle semantic text similarity tasks in the clinical domain.
We organized the first BioCreative/OHNLP ClinicalSTS shared task in 2018 by making available a real-world ClinicalSTS data set. We continued the shared task in 2019 in collaboration with National NLP Clinical Challenges (n2c2) and the Open Health Natural Language Processing (OHNLP) consortium and organized the 2019 n2c2/OHNLP ClinicalSTS track. We released a larger ClinicalSTS data set comprising 1642 clinical sentence pairs, including 1068 pairs from the 2018 shared task and 1006 new pairs from 2 electronic health record systems, GE and Epic. We released 80% (1642/2054) of the data to participating teams to develop and fine-tune the semantic textual similarity systems and used the remaining 20% (412/2054) as blind testing to evaluate their systems. The workshop was held in conjunction with the American Medical Informatics Association 2019 Annual Symposium.
Of the 78 international teams that signed on to the n2c2/OHNLP ClinicalSTS shared task, 33 produced a total of 87 valid system submissions. The top 3 systems were generated by IBM Research, the National Center for Biotechnology Information, and the University of Florida, with Pearson correlations of r=.9010, r=.8967, and r=.8864, respectively. Most top-performing systems used state-of-the-art neural language models, such as BERT and XLNet, and state-of-the-art training schemas in deep learning, such as pretraining and fine-tuning schema, and multitask learning. Overall, the participating systems performed better on the Epic sentence pairs than on the GE sentence pairs, despite a much larger portion of the training data being GE sentence pairs.
The 2019 n2c2/OHNLP ClinicalSTS shared task focused on computing semantic similarity for clinical text sentences generated from clinical notes in the real world. It attracted a large number of international teams. The ClinicalSTS shared task could continue to serve as a venue for researchers in natural language processing and medical informatics communities to develop and improve semantic textual similarity techniques for clinical text.
Journal Article
The robotic-surgery propositional bank
by
Ponzetto, Simone Paolo
,
Fiorini, Paolo
,
Rospocher, Marco
in
Algorithms
,
Annotations
,
Automation
2024
Robot-assisted minimally invasive surgery is the gold standard for the surgical treatment of many pathological conditions since it guarantees to the patient shorter hospital stay and quicker recovery. Several manuals and academic papers describe how to perform these interventions and thus contain important domain-specific knowledge. This information, if automatically extracted and processed, can be used to extract or summarize surgical practices or develop decision making systems that can help the surgeon or nurses to optimize the patient’s management before, during, and after the surgery by providing theoretical-based suggestions. However, general English natural language understanding algorithms have lower efficacy and coverage issues when applied to domain others than those they are typically trained on, and a domain specific textual annotated corpus is missing. To overcome this problem, we annotated the first robotic-surgery procedural corpus, with PropBank-style semantic labels. Starting from the original PropBank framebank, we enriched it by adding new lemmas, frames and semantic arguments required to cover missing information in general English but needed in procedural surgical language, releasing the Robotic-Surgery Procedural Framebank (RSPF). We then collected from robotic-surgery textbooks as-is sentences for a total of 32,448 tokens, and we annotated them with RSPF labels. We so obtained and publicly released the first annotated corpus of the robotic-surgical domain that can be used to foster further research on language understanding and procedural entities and relations extraction from clinical and surgical scientific literature.
Journal Article
Data Foundations for Medical AI: Provenance, Reliability and Limitations of Russian Clinical NLP Resources
by
Litvinov, Arsenii
,
Bespalov, Iaroslav
,
Shlyakhto, Evgeniy
in
Artificial intelligence
,
benchmarks
,
Clinical medicine
2026
Russian-language resources for medical natural language processing (NLP) are expanding rapidly; however, their fragmentation, uneven curation, and limited clinical reliability hinder the development of safe machine learning systems for prognosis, prevention, and precision medicine. We provide the first systematic survey of Russian medical NLP datasets and analyze their suitability for clinically meaningful tasks as defined by the MedHELM taxonomy. We additionally perform expert clinical validation of three representative public corpora—RuMedPrimeData (real outpatient notes), MedSyn (synthetic clinical notes), and RuMedNLI (translated natural language inference)—assessing clinical plausibility, diagnosis accuracy, and logical consistency. Experts identified substantial reliability issues: across randomly sampled subsets of each corpus, only approximately 20% of RuMedPrimeData records, fewer than 15% of MedSyn records, and approximately 55% of RuMedNLI pairs met essential quality criteria, which can hinder downstream ML systems built on these data. To support robust applications—ranging from medical chatbots and triage assistants to predictive and preventive models—we outline practical requirements for high-quality datasets: coordinated, expert-validated, machine-readable corpora aligned with clinical guidelines and insurance logic, standardized de-identification, and transparent provenance. Strengthening these data foundations will enable the development of reliable, reproducible, and clinically relevant AI systems suitable for real-world healthcare applications.
Journal Article
Balanced Knowledge Transfer in MTTL-ClinicalBERT: A Symmetrical Multi-Task Learning Framework for Clinical Text Classification
2025
Clinical text classification presents significant challenges in healthcare informatics due to inherent asymmetries in domain-specific terminology, knowledge distribution across specialties, and imbalanced data availability. We introduce MTTL-ClinicalBERT, a symmetrical multi-task transfer learning framework that harmonizes knowledge sharing across diverse medical specialties while maintaining balanced performance. Our approach addresses the fundamental problem of symmetry in knowledge transfer through three innovative components: (1) an adaptive knowledge distillation mechanism that creates symmetrical information flow between related medical domains while preventing negative transfer; (2) a bidirectional hierarchical attention architecture that establishes symmetry between local terminology analysis and global contextual understanding; and (3) a dynamic task-weighting strategy that maintains equilibrium in the learning process across asymmetrically distributed medical specialties. Extensive experiments on the MTSamples dataset demonstrate that our symmetrical approach consistently outperforms asymmetric baselines, achieving average improvements of 7.2% in accuracy and 6.8% in F1-score across five major specialties. The framework’s knowledge transfer patterns reveal a symmetric similarity matrix between specialties, with strongest bidirectional connections between cardiovascular/pulmonary and surgical domains (similarity score 0.83). Our model demonstrates remarkable stability and balance in low-resource scenarios, maintaining over 85% classification accuracy with only 30% of training data. The proposed framework not only advances clinical text classification through its symmetrical design but also provides valuable insights into balanced information sharing between different medical domains, with broader implications for symmetrical knowledge transfer in multi-domain machine learning systems.
Journal Article
Retrieval-Augmented Generation (RAG) in Healthcare: A Comprehensive Review
by
Shukla, Deepak Kumar
,
Neha, Fnu
,
Bhati, Deepshikha
in
Artificial intelligence
,
artificial intelligence (AI)
,
biomedical natural language processing (NLP)
2025
Retrieval-Augmented Generation (RAG) enhances large language models (LLMs) by integrating external knowledge retrieval to improve factual consistency and reduce hallucinations. Despite growing interest, its use in healthcare remains fragmented. This paper presents a Systematic Literature Review (SLR) following PRISMA guidelines, synthesizing 30 peer-reviewed studies on RAG in clinical domains, focusing on three of its most prevalent and promising applications in diagnostic support, electronic health record (EHR) summarization, and medical question answering. We synthesize the existing architectural variants (naïve, advanced, and modular) and examine their deployment across these applications. Persistent challenges are identified, including retrieval noise (irrelevant or low-quality retrieved information), domain shift (performance degradation when models are applied to data distributions different from their training set), generation latency, and limited explainability. Evaluation strategies are compared using both standard metrics and clinical-specific metrics, FactScore, RadGraph-F1, and MED-F1, which are particularly critical for ensuring factual accuracy, medical validity, and clinical relevance. This synthesis offers a domain-focused perspective to guide researchers, healthcare providers, and policymakers in developing reliable, interpretable, and clinically aligned AI systems, laying the groundwork for future innovation in RAG-based healthcare solutions.
Journal Article
From chaos to clarity: schema-constrained AI for auditable biomedical evidence extraction from full-text PDFs
by
Mortezaagha, Pouria
,
Sun, Bowen
,
Rahgozar, Arya
in
Accuracy
,
Artificial Intelligence
,
Atrial fibrillation
2026
Background
Biomedical evidence synthesis depends on accurate extraction of methodological, laboratory, and outcome variables from full-text research articles. These variables are predominantly embedded in complex scientific PDFs that interleave multi-column text, tables, figures, and captions, making manual abstraction time-intensive, error-prone, and increasingly impractical at the scale of contemporary systematic reviews. Despite advances in layout-aware and multimodal document models, end-to-end extraction systems suitable for evidence synthesis remain constrained by limited throughput, OCR error propagation, and insufficient auditability.
Methods
We propose a schema-constrained AI extraction system that transforms full-text biomedical PDFs into structured, analysis-ready records by explicitly restricting model inference through typed schemas, controlled vocabularies, and evidence-gated decisions. Documents are ingested using resume-aware hashing, partitioned into page-level and caption-aware chunks, and processed asynchronously under explicit concurrency and rate-limiting controls. A high-accuracy OCR model is guided by multiple domain-specific schemas covering bibliographic metadata, study design, populations, laboratory assays, timing and thresholds, clinical outcomes, and diagnostic performance. Chunk-level outputs are deterministically merged into study-level records using controlled vocabularies, conflict-aware handling of scalar fields, set-based aggregation of list-valued fields, and sentence-level evidence capture to enable traceability and post-hoc audit.
Results
Applied to a corpus of 734 biomedical articles on direct oral anticoagulant (DOAC) level measurement, the pipeline processed all documents without manual intervention while maintaining stable throughput. Schema-constrained extraction exhibited strong internal consistency, with sentence-level provenance populated for nearly all supported decisions. Iterative schema and prompt refinement yielded substantial improvements in extraction fidelity, particularly for outcome definitions, assay classification, and global coagulation testing. Outputs included reproducible CSV/Parquet datasets and caption-aware multimodal markdown reconstructions supporting efficient expert review.
Conclusions
Schema-constrained AI extraction enables scalable and auditable extraction of structured evidence from heterogeneous scientific PDFs. By combining deterministic chunking, asynchronous orchestration, controlled vocabularies, sentence-level provenance, and aggregated analytical outputs, the proposed pipeline aligns modern document understanding capabilities with the transparency, reproducibility, and reliability demands of biomedical evidence synthesis.
Journal Article
Knowledge-augmented pre-trained language models for biomedical relation extraction
2025
Automatic relationship extraction (RE) from biomedical literature is critical for managing the vast amount of scientific knowledge produced each year. In recent years, utilizing pre-trained language models (PLMs) has become the prevalent approach in RE. Several studies report improved performance when incorporating additional context information while fine-tuning PLMs for RE. However, variations in the PLMs applied, the databases used for augmentation, hyper-parameter optimization, and evaluation methods complicate direct comparisons between studies and raise questions about the generalizability of these findings. Our study addresses this research gap by evaluating PLMs enhanced with contextual information on five datasets spanning four relation scenarios within a consistent evaluation framework. We evaluate three baseline PLMs and first conduct extensive hyperparameter optimization. After selecting the top-performing model, we enhance it with additional data, including textual entity descriptions, relational information from knowledge graphs, and molecular structure encodings. Our findings illustrate the importance of (1) the choice of the underlying language model and (2) a comprehensive hyperparameter optimization for achieving strong extraction performance. Although inclusion of context information yield only minor overall improvements, an ablation study reveals substantial benefits for smaller PLMs when such external data was included during fine-tuning.
Journal Article
Cross- & multi-lingual medication detection: a transformer-based analysis
by
Kramer, Frank
,
Möller, Sebastian
,
Zweigenbaum, Pierre
in
Analysis
,
Annotations
,
Computational linguistics
2025
Extracting specific information, such as medication mentions, from large unstructured medical texts can be challenging, especially when no annotated corpus exists in the target language for training. To overcome this, leveraging existing machine learning models and datasets is essential, and since most pre-trained resources are in English, adopting multilingual approaches can help transferring between languages. In this work, we investigate the usage of a multi-lingual transformer model in a multi-lingual and cross-lingual setting to extract drug names from medical texts using named entity recognition in four European languages: German, English, French, and Spanish. We report the scores obtained by cross-lingual transfer with several published datasets after fine-tuning a multi-lingual model, aiming to create empirical evidence on how the transfer of “medical” knowledge between languages can be expected to benefit various language pairs. We further perform a qualitative error analysis and find that the performance on all languages achieves competitive levels. Conversely, erroneous prediction artifacts are introduced by annotation inconsistencies, differences in annotation guidelines and vague entity labels in general.
Journal Article