Asset Details
MbrlCatalogueTitleDetail
Do you wish to reserve the book?
Comparison of different feature extraction methods for applicable automated ICD coding
by
Yanni, Huo
, Wei, Zhao
, Jing, Yuan
, Yuxin, Wang
, Meng, Cui
, Shuai, Zhao
, Xiaolin, Diao
in
Automated ICD coding
/ Automation
/ Bag-of-words
/ BERT
/ Coding
/ Computational linguistics
/ Datasets
/ Deep learning
/ Extraction (Chemistry)
/ Feature extraction
/ Health Informatics
/ Information Systems and Communication Service
/ Interpretability
/ Knowledge
/ Language processing
/ Learning algorithms
/ Machine learning
/ Management of Computing and Information Systems
/ Medicine
/ Medicine & Public Health
/ Methods
/ Natural language interfaces
/ Neural networks
/ Semantics
/ Support vector machines
/ Uniqueness
/ Word2vec
2022
Hey, we have placed the reservation for you!
By the way, why not check out events that you can attend while you pick your title.
You are currently in the queue to collect this book. You will be notified once it is your turn to collect the book.
Oops! Something went wrong.
Looks like we were not able to place the reservation. Kindly try again later.
Are you sure you want to remove the book from the shelf?
Comparison of different feature extraction methods for applicable automated ICD coding
by
Yanni, Huo
, Wei, Zhao
, Jing, Yuan
, Yuxin, Wang
, Meng, Cui
, Shuai, Zhao
, Xiaolin, Diao
in
Automated ICD coding
/ Automation
/ Bag-of-words
/ BERT
/ Coding
/ Computational linguistics
/ Datasets
/ Deep learning
/ Extraction (Chemistry)
/ Feature extraction
/ Health Informatics
/ Information Systems and Communication Service
/ Interpretability
/ Knowledge
/ Language processing
/ Learning algorithms
/ Machine learning
/ Management of Computing and Information Systems
/ Medicine
/ Medicine & Public Health
/ Methods
/ Natural language interfaces
/ Neural networks
/ Semantics
/ Support vector machines
/ Uniqueness
/ Word2vec
2022
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
Do you wish to request the book?
Comparison of different feature extraction methods for applicable automated ICD coding
by
Yanni, Huo
, Wei, Zhao
, Jing, Yuan
, Yuxin, Wang
, Meng, Cui
, Shuai, Zhao
, Xiaolin, Diao
in
Automated ICD coding
/ Automation
/ Bag-of-words
/ BERT
/ Coding
/ Computational linguistics
/ Datasets
/ Deep learning
/ Extraction (Chemistry)
/ Feature extraction
/ Health Informatics
/ Information Systems and Communication Service
/ Interpretability
/ Knowledge
/ Language processing
/ Learning algorithms
/ Machine learning
/ Management of Computing and Information Systems
/ Medicine
/ Medicine & Public Health
/ Methods
/ Natural language interfaces
/ Neural networks
/ Semantics
/ Support vector machines
/ Uniqueness
/ Word2vec
2022
Please be aware that the book you have requested cannot be checked out. If you would like to checkout this book, you can reserve another copy
We have requested the book for you!
Your request is successful and it will be processed during the Library working hours. Please check the status of your request in My Requests.
Oops! Something went wrong.
Looks like we were not able to place your request. Kindly try again later.
Comparison of different feature extraction methods for applicable automated ICD coding
Journal Article
Comparison of different feature extraction methods for applicable automated ICD coding
2022
Request Book From Autostore
and Choose the Collection Method
Overview
Background
Automated ICD coding on medical texts via machine learning has been a hot topic. Related studies from medical field heavily relies on conventional bag-of-words (BoW) as the feature extraction method, and do not commonly use more complicated methods, such as word2vec (
W2V
) and large pretrained models like
BERT
. This study aimed at uncovering the most effective feature extraction methods for coding models by comparing
BoW
,
W2V
and
BERT
variants.
Methods
We experimented with a Chinese dataset from Fuwai Hospital, which contains 6947 records and 1532 unique ICD codes, and a public Spanish dataset, which contains 1000 records and 2557 unique ICD codes. We designed coding tasks with different code frequency thresholds (denoted as
f
s
), with a lower threshold indicating a more complex task. Using traditional classifiers, we compared
BoW
,
W2V
and
BERT
variants on accomplishing these coding tasks.
Results
When
f
s
was equal to or greater than 140 for Fuwai dataset, and 60 for the Spanish dataset, the
BERT
variants with the whole network fine-tuned was the best method, leading to a
Micro-F
1 of 93.9% for Fuwai data when
f
s
=
200
, and a
Micro-F
1 of 85.41% for the Spanish dataset when
f
s
=
180
. When
f
s
fell below 140 for Fuwai dataset, and 60 for the Spanish dataset,
BoW
turned out to be the best, leading to a
Micro-F
1 of 83% for Fuwai dataset when
f
s
=
20
, and a
Micro-F
1 of 39.1% for the Spanish dataset when
f
s
=
20
. Our experiments also showed that both the
BERT
variants and
BoW
possessed good interpretability, which is important for medical applications of coding models.
Conclusions
This study shed light on building promising machine learning models for automated ICD coding by revealing the most effective feature extraction methods. Concretely, our results indicated that fine-tuning the whole network of the
BERT
variants was the optimal method for tasks covering only frequent codes, especially codes that represented unspecified diseases, while
BoW
was the best for tasks involving both frequent and infrequent codes. The frequency threshold where the best-performing method varied differed between different datasets due to factors like language and codeset.
Publisher
BioMed Central,BioMed Central Ltd,Springer Nature B.V,BMC
This website uses cookies to ensure you get the best experience on our website.