Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Series TitleSeries Title
-
Item TypeItem Type
-
YearFrom:-To:
-
More FiltersMore FiltersIs Full-Text AvailableSubjectPublisherSourceLanguagePlace of PublicationContributors
Done
Filters
Reset
24,844
result(s) for
"Medical texts"
Sort by:
MHeTRep: A multilingual semantically tagged health terms repository
2023
This paper presents MHeTRep, a multilingual medical terminology and the methodology followed for its compilation. The multilingual terminology is organised into one vocabulary for each language. All the terms in the collection are semantically tagged with a tagset corresponding to the top categories of Snomed-CT ontology. When possible, the individual terms are linked to their equivalent in the other languages. Even though many NLP resources and tools claim to be domain independent, their application to specific tasks can be restricted to specific domains, otherwise their performance degrades notably. As the accuracy of NLP resources drops heavily when applied in environments different from which they were built, a tuning to the new environment is needed. Usually, having a domain terminology facilitates and accelerates the adaptation of general domain NLP applications to a new domain. This is particularly important in medicine, a domain living moments of great expansion. The proposed method takes Snomed-CT as starting point. From this point and using 13 multilingual resources, covering the most relevant medical concepts such as drugs, anatomy, clinical findings and procedures, we built a large resource covering seven languages totalling more than two million semantically tagged terms. The resulting collection has been intensively evaluated in several ways for the involved languages and domain categories. Our hypothesis is that MHeTRep can be used advantageously over the original resources for a number of NLP use cases and likely extended to other languages.
Journal Article
Enhancing Diabetes Management With CRIBC: A Novel NER Model for Constructing A Comprehensive Chinese Medical Knowledge Graph
by
Setyohadi, Djoko Budiyanto
,
Long, Zalizah Awang
,
Xu, Yiqing
in
Chinese medical texts
,
CRIBC model
,
diabetes knowledge graph
2025
This study proposes CRIBC, a novel Named Entity Recognition (NER) model tailored for Chinese medical texts, specifically focusing on diabetes‐related data. By improving entity recognition accuracy, CRIBC facilitates the construction of a comprehensive knowledge graph to enhance diabetes research and clinical decision‐making. CRIBC integrates Chinese‐RoBERTa‐WWM‐EXT, IDCNN, BiLSTM, and CRF to optimize entity extraction. The model was trained on the DiaKG dataset and validated on the CMeEE dataset. Performance was evaluated using precision, recall, and F1‐score. A diabetes knowledge graph was then constructed based on the extracted entities and relationships. CRIBC achieved an F1‐score of 80.88% on the DiaKG dataset and 67.91% on the CMeEE dataset, outperforming baseline models. The constructed knowledge graph contains 23,134 nodes and 42,520 edges, providing structured insights into diabetes management, aiding clinical decision‐making and medical research. CRIBC significantly enhances NER accuracy in Chinese medical texts, enabling efficient knowledge graph construction for diabetes management. Future research will focus on expanding datasets and refining the model's capabilities for broader medical applications. This study proposes CRIBC, an advanced Named Entity Recognition (NER) model tailored for Chinese medical text processing. By integrating Chinese‐RoBERTa‐WWM‐EXT with BiLSTM‐CRF and IDCNN, CRIBC enables more accurate entity extraction and knowledge representation. The resulting diabetes knowledge graph enhances information structuring, supporting clinical decision‐making and advancing medical text analysis.
Journal Article
Differentiating ChatGPT-Generated and Human-Written Medical Texts: Quantitative Study
2023
Large language models, such as ChatGPT, are capable of generating grammatically perfect and human-like text content, and a large number of ChatGPT-generated texts have appeared on the internet. However, medical texts, such as clinical notes and diagnoses, require rigorous validation, and erroneous medical content generated by ChatGPT could potentially lead to disinformation that poses significant harm to health care and the general public.
This study is among the first on responsible artificial intelligence-generated content in medicine. We focus on analyzing the differences between medical texts written by human experts and those generated by ChatGPT and designing machine learning workflows to effectively detect and differentiate medical texts generated by ChatGPT.
We first constructed a suite of data sets containing medical texts written by human experts and generated by ChatGPT. We analyzed the linguistic features of these 2 types of content and uncovered differences in vocabulary, parts-of-speech, dependency, sentiment, perplexity, and other aspects. Finally, we designed and implemented machine learning methods to detect medical text generated by ChatGPT. The data and code used in this paper are published on GitHub.
Medical texts written by humans were more concrete, more diverse, and typically contained more useful information, while medical texts generated by ChatGPT paid more attention to fluency and logic and usually expressed general terminologies rather than effective information specific to the context of the problem. A bidirectional encoder representations from transformers-based model effectively detected medical texts generated by ChatGPT, and the F
score exceeded 95%.
Although text generated by ChatGPT is grammatically perfect and human-like, the linguistic characteristics of generated medical texts were different from those written by human experts. Medical text generated by ChatGPT could be effectively detected by the proposed machine learning algorithms. This study provides a pathway toward trustworthy and accountable use of large language models in medicine.
Journal Article
Medical Texts and Their Translation in Translator Training
2023
With the development of medicine, the demand for the translation of medical texts has increased significantly. Translations play an important role in disseminating medical knowledge and new medical discoveries and are vital in the provision of health services to foreigners, tourists, or minorities. Translating medical texts requires a variety of skills. In our study, we assess the extent to which translation and interpretation students at Sapientia Hungarian University of Transylvania are able to translate medical texts from English into their mother tongue (Hungarian) and Romanian (the official language of the country). With the purpose of curriculum development, we examine whether the lack of medical knowledge affects the work of translators and what strategies can be used in translation in the absence of this expertise. We also examine our students’ attitude related to translating medical texts and becoming a medical translator.
Journal Article
Should free-text data in electronic medical records be shared for research? A citizens’ jury study in the UK
2020
BackgroundUse of routinely collected patient data for research and service planning is an explicit policy of the UK National Health Service and UK government. Much clinical information is recorded in free-text letters, reports and notes. These text data are generally lost to research, due to the increased privacy risk compared with structured data. We conducted a citizens’ jury which asked members of the public whether their medical free-text data should be shared for research for public benefit, to inform an ethical policy.MethodsEighteen citizens took part over 3 days. Jurors heard a range of expert presentations as well as arguments for and against sharing free text, and then questioned presenters and deliberated together. They answered a questionnaire on whether and how free text should be shared for research, gave reasons for and against sharing and suggestions for alleviating their concerns.ResultsJurors were in favour of sharing medical data and agreed this would benefit health research, but were more cautious about sharing free-text than structured data. They preferred processing of free text where a computer extracted information at scale. Their concerns were lack of transparency in uses of data, and privacy risks. They suggested keeping patients informed about uses of their data, and giving clear pathways to opt out of data sharing.ConclusionsInformed citizens suggested a transparent culture of research for the public benefit, and continuous improvement of technology to protect patient privacy, to mitigate their concerns regarding privacy risks of using patient text data.
Journal Article
A neural network multi-task learning approach to biomedical named entity recognition
2017
Background
Named Entity Recognition (NER) is a key task in biomedical text mining. Accurate NER systems require task-specific, manually-annotated datasets, which are expensive to develop and thus limited in size. Since such datasets contain related but different information, an interesting question is whether it might be possible to use them together to improve NER performance. To investigate this, we develop supervised, multi-task, convolutional neural network models and apply them to a large number of varied existing biomedical named entity datasets. Additionally, we investigated the effect of dataset size on performance in both single- and multi-task settings.
Results
We present a single-task model for NER, a Multi-output multi-task model and a Dependent multi-task model. We apply the three models to 15 biomedical datasets containing multiple named entities including Anatomy, Chemical, Disease, Gene/Protein and Species. Each dataset represent a task. The results from the single-task model and the multi-task models are then compared for evidence of benefits from Multi-task Learning.
With the Multi-output multi-task model we observed an average F-score improvement of 0.8% when compared to the single-task model from an average baseline of 78.4%. Although there was a significant drop in performance on one dataset, performance improves significantly for five datasets by up to 6.3%. For the Dependent multi-task model we observed an average improvement of 0.4% when compared to the single-task model. There were no significant drops in performance on any dataset, and performance improves significantly for six datasets by up to 1.1%.
The dataset size experiments found that as dataset size decreased, the multi-output model’s performance increased compared to the single-task model’s. Using 50, 25 and 10% of the training data resulted in an average drop of approximately 3.4, 8 and 16.7% respectively for the single-task model but approximately 0.2, 3.0 and 9.8% for the multi-task model.
Conclusions
Our results show that, on average, the multi-task models produced better NER results than the single-task models trained on a single NER dataset. We also found that Multi-task Learning is beneficial for small datasets. Across the various settings the improvements are significant, demonstrating the benefit of Multi-task Learning for this task.
Journal Article
Hybrid natural language processing tool for semantic annotation of medical texts in Spanish
by
Capllonch-Carrión, Adrián
,
Valverde-Mateos, Ana
,
Campillos-Llanos, Leonardo
in
Algorithms
,
Analysis
,
Annotations
2025
Background
Natural language processing (NLP) enables the extraction of information embedded within unstructured texts, such as clinical case reports and trial eligibility criteria. By identifying relevant medical concepts, NLP facilitates the generation of structured and actionable data, supporting complex tasks like cohort identification and the analysis of clinical records. To accomplish those tasks, we introduce a deep learning-based and lexicon-based named entity recognition (NER) tool for texts in Spanish. It performs medical NER and normalization, medication information extraction and detection of temporal entities, negation and speculation, and temporality or experiencer attributes (Age, Contraindicated, Negated, Speculated, Hypothetical, Future, Family_member, Patient and Other). We built the tool with a dedicated lexicon and rules adapted from NegEx and HeidelTime. Using these resources, we annotated a corpus of 1200 texts, with high inter-annotator agreement (average F1 = 0.841% ± 0.045 for entities, and average F1 = 0.881% ± 0.032 for attributes). We used this corpus to train Transformer-based models (RoBERTa-based models, mBERT and mDeBERTa). We integrated them with the dictionary-based system in a hybrid tool, and distribute the models via the Hugging Face hub. For an internal validation, we used a held-out test set and conducted an error analysis. For an external validation, eight medical professionals evaluated the system by revising the annotation of 200 new texts not used in development.
Results
In the internal validation, the models yielded F1 values up to 0.915. In the external validation with 100 clinical trials, the tool achieved an average F1 score of 0.858 (± 0.032); and in 100 anonymized clinical cases, it achieved an average F1 score of 0.910 (± 0.019).
Conclusions
The tool is available at
https://claramed.csic.es/medspaner
. We also release the code (
https://github.com/lcampillos/medspaner
) and the annotated corpus to train the models.
Journal Article
Identification and cause analysis on unplanned reoperations by text classification approach
2025
Unplanned reoperations (URs) not only increase the hospitalization period and healthcare cost, but also raise the death risk of patients. The analysis of URs is thus significant for their quality control and reduction. However, the massive text data generated in hospitals makes the identification of URs a tedious task with potential bias. Current research on UR is limited to data analysis and lack automated classification using deep learning and natural language processing. Here we propose the UR-Net framework. It implements the UR identification and cause analysis by processing the long texts of ward round documentation and applying few-shot learning on multi-class cause classification. Our framework consists of the URNet-XL with a batch fusion method based on XLNet model, and the URNet-GT for cause classification based on the pre-trained model combined with feature extraction modules of multi-head attention and a bi-directional Gated Recurrent Unit. High weighted F1 scores of 96.34% and 93.37% are obtained for the respective processes in comparison with the baseline methods. The Area Under receiver operating characteristic Curve (AUC) of 97.86% indicates an excellent UR classification on the unbalanced dataset. Our approach provides a new route of UR identification and analysis with the potential of reducing its occurrence.
Journal Article
Enhancing medical text classification with GAN-based data augmentation and multi-task learning in BERT
2025
With the rapid advancement of medical informatics, the accumulation of electronic medical records and clinical diagnostic data provides unprecedented opportunities for intelligent medical text classification. However, challenges such as class imbalance, semantic heterogeneity, and data sparsity limit the effectiveness of traditional classification models. In this study, we propose an enhanced medical text classification framework by integrating a self-attentive adversarial augmentation network (SAAN) for data augmentation and a disease-aware multi-task BERT (DMT-BERT) strategy. The proposed SAAN incorporates adversarial self-attention, improving the generation of high-quality minority class samples while mitigating noise. Furthermore, DMT-BERT simultaneously learns medical text representations and disease co-occurrence relationships, enhancing feature extraction from rare symptoms. Extensive experiments on the private clinical datasets and the public CCKS 2017 dataset demonstrate that our approach significantly outperforms baseline models, achieving the highest F1-score and ROC-AUC values. The proposed innovations address key limitations in medical text classification and contribute to the development of robust clinical decision-support systems.
Journal Article