Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
77
result(s) for
"ICD coding"
Sort by:
Death Certification Errors and the Effect on Mortality Statistics
by
McGivern, Lauri
,
Carney, Jan K.
,
Shapiro, Steven
in
Cause of Death
,
Certificates
,
Certification
2017
Objective: Errors in cause and manner of death on death certificates are common and affect families, mortality statistics, and public health research. The primary objective of this study was to characterize errors in the cause and manner of death on death certificates completed by non–Medical Examiners. A secondary objective was to determine the effects of errors on national mortality statistics.
Methods: We retrospectively compared 601 death certificates completed between July 1, 2015, and January 31, 2016, from the Vermont Electronic Death Registration System with clinical summaries from medical records. Medical Examiners, blinded to original certificates, reviewed summaries, generated mock certificates, and compared mock certificates with original certificates. They then graded errors using a scale from 1 to 4 (higher numbers indicated increased impact on interpretation of the cause) to determine the prevalence of minor and major errors. They also compared International Classification of Diseases, 10th Revision (ICD-10) codes on original certificates with those on mock certificates.
Results: Of 601 original death certificates, 319 (53%) had errors; 305 (51%) had major errors; and 59 (10%) had minor errors. We found no significant differences by certifier type (physician vs nonphysician).We did find significant differences in major errors in place of death (P < .001). Certificates for deaths occurring in hospitals were more likely to have major errors than certificates for deaths occurring at a private residence (59% vs 39%, P < .001). A total of 580 (93%) death certificates had a change in ICD-10 codes between the original and mock certificates, of which 348 (60%) had a change in the underlying cause-of-death code.
Conclusions: Error rates on death certificates in Vermont are high and extend to ICD-10 coding, thereby affecting national mortality statistics. Surveillance and certifier education must expand beyond local and state efforts. Simplifying and standardizing underlying literal text for cause of death may improve accuracy, decrease coding errors, and improve national mortality statistics.
Journal Article
ICDXML: enhancing ICD coding with probabilistic label trees and dynamic semantic representations
2024
Accurately assigning standardized diagnosis and procedure codes from clinical text is crucial for healthcare applications. However, this remains challenging due to the complexity of medical language. This paper proposes a novel model that incorporates extreme multi-label classification tasks to enhance International Classification of Diseases (ICD) coding. The model utilizes deformable convolutional neural networks to fuse representations from hidden layer outputs of pre-trained language models and external medical knowledge embeddings fused using a multimodal approach to provide rich semantic encodings for each code. A probabilistic label tree is constructed based on the hierarchical structure existing in ICD labels to incorporate ontological relationships between ICD codes and enable structured output prediction. Experiments on medical code prediction on the MIMIC-III database demonstrate competitive performance, highlighting the benefits of this technique for robust clinical code assignment.
Journal Article
An explainable CNN approach for medical codes prediction from clinical text
2021
Background
Clinical notes are unstructured text documents generated by clinicians during patient encounters, generally are annotated with International Classification of Diseases (ICD) codes, which give formatted information about the diagnosis and treatment. ICD code has shown its potentials in many fields, but manual coding is labor-intensive and error-prone, lead to researches of automatic coding. Two specific challenges of this task are (1) given an annotated clinical notes, the reasons behind specific diagnoses and treatments are implicit; (2) explainability is important for practical automatic coding method, the method should not only explain its prediction output but also have explainable internal mechanics. This study aims to develop an explainable CNN approach to address these two challenges.
Method
Our key idea is that for the automatic ICD coding task, the presence of informative snippets in the clinical text that correlated with each code plays an important role in the prediction of codes, and an informative snippet can be considered as a local and low-level feature. We infer that there exists a correspondence between a convolution filter and a local and low-level feature. Base on the inference, we come up with the Shallow and Wide Attention convolutional Mechanism (SWAM) to improve the CNN-based models’ ability to learn local and low-level features for each label.
Results
We evaluate our approach on MIMIC-III, an open-access dataset of ICU medical records. Our approach substantially outperforms previous results on top-50 medical code prediction on MIMIC-III dataset, the precision of the worst-performing 10% labels in previous works is increased from 0% to 53% on average. We attribute this improvement to SWAM, by which the wide architecture with attention mechanism gives the model ability to more extensively learn the unique features of different codes, and we prove it by an ablation experiment. Besides, we perform manual analysis of the performance imbalance between different codes, and preliminary conclude the characteristics that determine the difficulty of learning specific codes.
Conclusions
Our main contributions can be summarized into the following three: (1) We present local and low-level features, a.k.a. informative snippets play an important role in the automatic ICD coding task, and the informative snippets extracted from the clinical text provide explanations for each code. (2) We propose that there exists a correspondence between a convolution filter and a local and low-level feature. A combination of wide and shallow convolutional layer and attention layer can help the CNN-based models better learn local and low-level features. (3) We improved the precision of the worst-performing 10% labels from 0 to 53% on average.
Journal Article
Using Medical Named Entity Recognition in Automatic ICD Prediction
by
Dashash, Mayssoon
,
Kawas, Mohamad
,
Alkhatib, Bassel
in
Algorithms
,
Artificial Intelligence
,
Databases, Factual
2025
The International Classification of Diseases (ICD) serves as a standard in medical coding. Researchers in artificial intelligence, including those focused on natural language processing and machine learning, have made a significant effort to build and develop automatic ICD encoding systems and algorithms. Many algorithms have been developed to implement automatic ICD encoding, but almost all of these algorithms depended on the raw text input without taking into consideration the important medical entities in this input. In this paper, we propose an algorithm for automatically predicting ICD codes based on patient claims. Our algorithm contains several steps for finding the most relevant ICD codes. Primarily, our proposed algorithm employs medical named entity recognition (NER) to find the most important medical entities in a patient claim. For this purpose, the Medical NER model was used based on the BERT model. Next, the algorithm generates embeddings for the extracted entities using the ClinicalBERT model. To identify the most relevant ICD code, the algorithm creates embeddings for an ICD catalog, which contains various information such as chapter descriptions, long descriptions, short descriptions, and ICD codes. The embedding process is primarily based on the long descriptions, and the results are stored in a local database that contains embedding vectors and corresponding mapped ICD codes. The final step of the algorithm calculates the cosine similarity between the embedding vector generated from the patient complaint and the ICD long description vectors. The strength of this new algorithm is that it first detects the medical entities in the textual input and then predicts the most similar ICD codes. Also, our developed algorithm does not need such huge data for training. We tested the developed algorithm on a medical dataset, and the results indicate that the proposed method is highly efficient, achieving a precision rate of approximately 90%.
Journal Article
Explainable Prediction of Medical Codes With Knowledge Graphs
by
Xu, Qiang
,
Huang, LuFei
,
Teng, Fei
in
automated ICD coding
,
Bioengineering and Biotechnology
,
Classification
2020
International Classification of Diseases (ICD) is an authoritative health care classification system of different diseases. It is widely used for disease and health records, assisted medical reimbursement decisions, and collecting morbidity and mortality statistics. The most existing ICD coding models only translate the simple diagnosis descriptions into ICD codes. And it obscures the reasons and details behind specific diagnoses. Besides, the label (code) distribution is uneven. And there is a dependency between labels. Based on the above considerations, the knowledge graph and attention mechanism were expanded into medical code prediction to improve interpretability. In this study, a new method called G_Coder was presented, which mainly consists of Multi-CNN, graph presentation, attentional matching, and adversarial learning. The medical knowledge graph was constructed by extracting entities related to ICD-9 from freebase. Ontology contains 5 entity classes, which are disease, symptom, medicine, surgery, and examination. The result of G_Coder on the MIMIC-III dataset showed that the micro-F1 score is 69.2% surpassing the state of art. The following conclusions can be obtained through the experiment: G_Coder integrates information across medical records using Multi-CNN and embeds knowledge into ICD codes. Adversarial learning is used to generate the adversarial samples to reconcile the writing styles of doctor. With the knowledge graph and attention mechanism, most relevant segments of medical codes can be explained. This suggests that the knowledge graph significantly improves the precision of code prediction and reduces the working pressure of the human coders.
Journal Article
Effects of inappropriate cause-of-death certification on mortality from cardiovascular disease and diabetes mellitus in Tonga
by
Figueroa, Carah A.
,
Dearie, Catherine
,
Kupu, Sioape
in
Adult mortality
,
Analysis
,
Arteriosclerosis
2023
Background
Cardiovascular disease (CVD) and diabetes mellitus are major health issues in Tonga and other Pacific countries, although mortality levels and trends are unclear. We assess the impacts of cause-of-death certification on coding of CVD and diabetes as underlying causes of death (UCoD).
Methods
Tongan records containing cause-of-death data (2001–2018), including medical certificates of cause-of-death (MCCD), had UCoD assigned according to International Classification of Diseases 10th revision (ICD-10) coding rules. Deaths without recorded cause were included to ascertain total mortality. Diabetes and hypertension causes were reallocated from Part 1 of the MCCD (direct cause) to Part 2 (contributory cause) if potentially fatal complications were not recorded, and an alternative UCoD was assigned. Proportional mortality by cause based on the alternative UCoD were applied to total deaths then mortality rates calculated by age and sex using census/intercensal population estimates. CVD and diabetes mortality rates for unaltered and alternative UCoD were compared using Poisson regression.
Results
Over 2001–18, in ages 35–59 years, alternative CVD mortality was higher than unaltered CVD mortality in men (
p
= 0.043) and women (
p
= 0.15); for 2010–18, alternative versus unaltered measures in men were 3.3/10
3
(95%CI: 3.0–3.7/10
3
) versus 2.9/10
3
(95%CI: 2.6–3.2/10
3
), and in women were 1.1/10
3
(95%CI: 0.9–1.3/10
3
) versus 0.9/10
3
(95%CI: 0.8–1.1/10
3
). Conversely, alternative diabetes mortality rates were significantly lower than the unaltered rates over 2001–18 in men (
p
< 0.0001) and women (
p
= 0.013); for 2010–18, these measures in men were 1.3/10
3
(95%CI: 1.1–1.5/10
3
) versus 1.9/10
3
(95%CI: 1.6–2.2/10
3
), and in women were 1.4/10
3
(95%CI: 1.2–1.7/10
3
) versus 1.7/10
3
(95%CI: 1.5–2.0/10
3
). Diabetes mortality rates increased significantly over 2001–18 in men (unaltered:
p
< 0.0001; alternative:
p
= 0.0007) and increased overall in women (unaltered:
p
= 0.0015; alternative:
p
= 0.014).
Conclusions
Diabetes reporting in Part 1 of the MCCD, without potentially fatal diabetes complications, has led to over-estimation of diabetes, and under-estimation of CVD, as UCoD in Tonga. This indicates the importance of controlling various modifiable risks for atherosclerotic CVD (including stroke) including hypertension, tobacco use, and saturated fat intake, besides obesity and diabetes. Accurate certification of diabetes as a direct cause of death (Part 1) or contributory factor (Part 2) is needed to ensure that valid UCoD are assigned. Examination of multiple cause-of-death data can improve understanding of the underlying causes of premature mortality to better inform health planning.
Journal Article
Comparison of different feature extraction methods for applicable automated ICD coding
2022
Background
Automated ICD coding on medical texts via machine learning has been a hot topic. Related studies from medical field heavily relies on conventional bag-of-words (BoW) as the feature extraction method, and do not commonly use more complicated methods, such as word2vec (
W2V
) and large pretrained models like
BERT
. This study aimed at uncovering the most effective feature extraction methods for coding models by comparing
BoW
,
W2V
and
BERT
variants.
Methods
We experimented with a Chinese dataset from Fuwai Hospital, which contains 6947 records and 1532 unique ICD codes, and a public Spanish dataset, which contains 1000 records and 2557 unique ICD codes. We designed coding tasks with different code frequency thresholds (denoted as
f
s
), with a lower threshold indicating a more complex task. Using traditional classifiers, we compared
BoW
,
W2V
and
BERT
variants on accomplishing these coding tasks.
Results
When
f
s
was equal to or greater than 140 for Fuwai dataset, and 60 for the Spanish dataset, the
BERT
variants with the whole network fine-tuned was the best method, leading to a
Micro-F
1 of 93.9% for Fuwai data when
f
s
=
200
, and a
Micro-F
1 of 85.41% for the Spanish dataset when
f
s
=
180
. When
f
s
fell below 140 for Fuwai dataset, and 60 for the Spanish dataset,
BoW
turned out to be the best, leading to a
Micro-F
1 of 83% for Fuwai dataset when
f
s
=
20
, and a
Micro-F
1 of 39.1% for the Spanish dataset when
f
s
=
20
. Our experiments also showed that both the
BERT
variants and
BoW
possessed good interpretability, which is important for medical applications of coding models.
Conclusions
This study shed light on building promising machine learning models for automated ICD coding by revealing the most effective feature extraction methods. Concretely, our results indicated that fine-tuning the whole network of the
BERT
variants was the optimal method for tasks covering only frequent codes, especially codes that represented unspecified diseases, while
BoW
was the best for tasks involving both frequent and infrequent codes. The frequency threshold where the best-performing method varied differed between different datasets due to factors like language and codeset.
Journal Article
GoM-ICD: Automatic ICD Coding with Gap Schemes and Mixture of Experts
by
Qiu, Weiyan
,
Zeng, Min
,
Zhu, Hongtao
in
automatic international classification of disease (icd) coding
,
Big Data
,
Classification
2025
Assigning standardized International Classification of Disease (ICD) codes to Electronic Medical Records (EMR) is crucial for enhancing the efficiency and accuracy of medical coding processes. However, existing methods face challenges in effectively capturing, integrating, and amalgamating specialized medical knowledge from complex textual data. In this study, we propose GoM-ICD, an advanced automatic ICD coding framework that integrates multiple gap schemes with a Mixture of Experts (MoE) architecture. GoM-ICD is designed to address the extreme multilabel text classification in ICD coding. It segments and reassembles text to facilitate seamless information exchange across different chunks, employing various segmentation methods derived from different gap schemes. Then the model-level MoE consolidates the predictions of these methods to enhance the prediction performance. Specifically, the segmented text is input to a Pretrained Language Model (PLM) to extract textual features. Next, a Bidirectional Long Short-Term Memory network (BiLSTM) is employed to capture long-term contextual information from the textual features. Finally, a text-level MoE, followed by a label-level MoE, enables precise attention matching between text and labels, thereby improving the fidelity of the coding process. The three levels of MoE leverage the collective insights of diverse expert models, effectively processing multi-dimensional text features and unifying model-level insights from various gap schemes. Extensive experimental results demonstrate that GoM-ICD achieves the state-of-the-art performance in automatic ICD coding tasks, reaching micro-F1 of 0.617, 0.620, and 0.613 on datasets MIMIC-III full, MIMIC-III clean, and MIMIC-IV ICD-10, respectively. The source code can be obtained from https://github.com/CSUBioGroup/GoM-ICD.
Journal Article
Knowledge guided multi-filter residual convolutional neural network for ICD coding from clinical text
by
Goswami, Prantik
,
Jürjens, Jan
,
Boukhers, Zeyd
in
Artificial Intelligence
,
Artificial neural networks
,
Coding
2023
A common challenge encountered when using Deep Neural Network models for automatic ICD coding is their potential inability to effectively handle unseen clinical texts, especially when these models are only trained on a limited number of examples. This is because these models rely solely on the patterns and relationships present in the training data, and may not be able to effectively incorporate additional knowledge about the relationships between medical entities. To address this issue, we introduce
KG-MultiResCNN—
K
nowledge
G
uided
Multi
-filter
Res
idual
C
onvolutional
N
eural
N
etwork
model, which combines training examples with external knowledge from the Wikidata Knowledge Graph (KG) in order to better capture the relationships between medical entities. The KG is a structured database that contains a wealth of information about various entities, including medical concepts and their relationships with one another. By incorporating this external knowledge into our model, we are able to improve its ability to predict ICD codes for new clinical texts. In our experiments with the MIMIC-III dataset, we found that the KG-MultiResCNN model significantly outperformed the baseline approaches. This demonstrates the effectiveness of using external knowledge, in addition to training examples, to improve the performance of deep learning models for automatic ICD coding.
Journal Article
Underestimated prevalence of heart failure in hospital inpatients: a comparison of ICD codes and discharge letter information
by
Ertl, Maximilian
,
Puppe, Frank
,
Güder, Gülmisal
in
Algorithms
,
Beta blockers
,
Data warehouses
2018
BackgroundHeart failure is the predominant cause of hospitalization and amongst the leading causes of death in Germany. However, accurate estimates of prevalence and incidence are lacking. Reported figures originating from different information sources are compromised by factors like economic reasons or documentation quality.MethodsWe implemented a clinical data warehouse that integrates various information sources (structured parameters, plain text, data extracted by natural language processing) and enables reliable approximations to the real number of heart failure patients. Performance of ICD-based diagnosis in detecting heart failure was compared across the years 2000–2015 with (a) advanced definitions based on algorithms that integrate various sources of the hospital information system, and (b) a physician-based reference standard.ResultsApplying these methods for detecting heart failure in inpatients revealed that relying on ICD codes resulted in a marked underestimation of the true prevalence of heart failure, ranging from 44% in the validation dataset to 55% (single year) and 31% (all years) in the overall analysis. Percentages changed over the years, indicating secular changes in coding practice and efficiency. Performance was markedly improved using search and permutation algorithms from the initial expert-specified query (F1 score of 81%) to the computer-optimized query (F1 score of 86%) or, alternatively, optimizing precision or sensitivity depending on the search objective.ConclusionsEstimating prevalence of heart failure using ICD codes as the sole data source yielded unreliable results. Diagnostic accuracy was markedly improved using dedicated search algorithms. Our approach may be transferred to other hospital information systems.
Journal Article