Development and external validation of automated ICD-10 coding from discharge summaries using deep learning approaches

Ponthongmak, Wanchana; Thammasudjarit, Ratchainant; McKay, Gareth J.; Attia, John; Theera-Ampornpunt, Nawanan; Thakkinstian, Ammarin

Title: Development and external validation of automated ICD-10 coding from discharge summaries using deep learning approaches
Creator: Ponthongmak, Wanchana; Thammasudjarit, Ratchainant; McKay, Gareth J.; Attia, John; Theera-Ampornpunt, Nawanan; Thakkinstian, Ammarin
Relation: Informatics in Medicine Unlocked Vol. 38, no. 101227
Publisher Link: http://dx.doi.org/10.1016/j.imu.2023.101227
Publisher: Elsevier
Resource Type: journal article
Date: 2023
Description: Objectives: To develop an automated international classification of diseases (ICD) coding tool using natural language processing (NLP) and discharge summary texts from Thailand. Materials and methods: The development phase included 15,329 discharge summaries from Ramathibodi Hospital from January 2015 to December 2020. The external validation phase included Medical Information Mart for Intensive Care III (MIMIC-III) data. Three algorithms were developed: naïve Bayes with term frequency-inverse document frequency (NB-TF-IDF), convolutional neural network with neural word embedding (CNN-NWE), and CNN with PubMedBERT (CNN-PubMedBERT). In addition, two state-of-the-art models were also considered; convolutional attention for multi-label classification (CAML) and pretrained language models for automatic ICD coding (PLM-ICD). Results: The CNN-PubMedBERT model provided average micro- and macro-area under precision-recall curve (AUPRC) of 0.6605 and 0.5538, which outperformed CNN-NWE (0.6528 and 0.5564), NB-TF-IDF (0.4441 and 0.3562), and CAML (0.6257 and 0.4964), with corresponding differences of (0.0077 and −0.0026), (0.2164 and 0.1976), and (0.0348 and 0.0574), respectively. However, CNN-PubMedBERT performed less well relative to PLM-ICD, with corresponding AUPRCs of 0.7202 and 0.5865. The CNN-PubMedBERT model was externally validated using two subsets of MIMIC-III; MIMIC-ICD-10, and MIMIC-ICD-9 datasets, which contained 40,923 and 31,196 discharge summaries. The average micro-AUPRCs were 0.3745, 0.6878, and 0.6699, corresponding to directly predictive MIMIC-ICD-10, MIMIC-ICD-10 fine-tuning, and MIMIC-ICD-9 fine-tuning approaches; the average macro-AUPRCs for the corresponding models were 0.2819, 0.4219 and 0.5377, respectively. Discussion: CNN-PubMedBERT performed second-best to PLM-ICD, with considerable variation observed between average micro- and macro-AUPRC, especially for external validation, generally indicating good overall prediction but limited predictive value for small sample sizes. External validation in a US cohort demonstrated a higher level of model prediction performance. Conclusion: Both PLM-ICD and CNN-PubMedBERT models may provide useful tools for automated ICD-10 coding. Nevertheless, further evaluation and validation within Thai and Asian healthcare systems may prove more informative for clinical application.
Subject: deep learning; natural language processing; international classification of diseases; patient discharge summaries; SDG 17; Sustainable Development Goals
Identifier: http://hdl.handle.net/1959.13/1486793
Identifier: uon:51955
Identifier: ISSN:2352-9148
Language: eng
Reviewed

Hits: 1187
Visitors: 1181
Downloads: 0

		Thumbnail	File	Description	Size	Format