TY - JOUR
T1 - Automated Laryngeal Invasion Detector of Boluses in Videofluoroscopic Swallowing Study Videos Using Action Recognition-Based Networks
AU - Nam, Kihwan
AU - Lee, Changyeol
AU - Lee, Taeheon
AU - Shin, Munseop
AU - Kim, Bo Hae
AU - Park, Jin Woo
N1 - Publisher Copyright:
© 2024 by the authors.
PY - 2024/7
Y1 - 2024/7
N2 - We aimed to develop an automated detector that determines laryngeal invasion during swallowing. Laryngeal invasion, which causes significant clinical problems, is defined as two or more points on the penetration–aspiration scale (PAS). We applied two three-dimensional (3D) stream networks for action recognition in videofluoroscopic swallowing study (VFSS) videos. To detect laryngeal invasion (PAS 2 or higher scores) in VFSS videos, we employed two 3D stream networks for action recognition. To establish the robustness of our model, we compared its performance with those of various current image classification-based architectures. The proposed model achieved an accuracy of 92.10%. Precision, recall, and F1 scores for detecting laryngeal invasion (≥PAS 2) in VFSS videos were 0.9470 each. The accuracy of our model in identifying laryngeal invasion surpassed that of other updated image classification models (60.58% for ResNet101, 60.19% for Swin-Transformer, 63.33% for EfficientNet-B2, and 31.17% for HRNet-W32). Our model is the first automated detector of laryngeal invasion in VFSS videos based on video action recognition networks. Considering its high and balanced performance, it may serve as an effective screening tool before clinicians review VFSS videos, ultimately reducing the burden on clinicians.
AB - We aimed to develop an automated detector that determines laryngeal invasion during swallowing. Laryngeal invasion, which causes significant clinical problems, is defined as two or more points on the penetration–aspiration scale (PAS). We applied two three-dimensional (3D) stream networks for action recognition in videofluoroscopic swallowing study (VFSS) videos. To detect laryngeal invasion (PAS 2 or higher scores) in VFSS videos, we employed two 3D stream networks for action recognition. To establish the robustness of our model, we compared its performance with those of various current image classification-based architectures. The proposed model achieved an accuracy of 92.10%. Precision, recall, and F1 scores for detecting laryngeal invasion (≥PAS 2) in VFSS videos were 0.9470 each. The accuracy of our model in identifying laryngeal invasion surpassed that of other updated image classification models (60.58% for ResNet101, 60.19% for Swin-Transformer, 63.33% for EfficientNet-B2, and 31.17% for HRNet-W32). Our model is the first automated detector of laryngeal invasion in VFSS videos based on video action recognition networks. Considering its high and balanced performance, it may serve as an effective screening tool before clinicians review VFSS videos, ultimately reducing the burden on clinicians.
KW - artificial intelligence
KW - deep learning
KW - deglutition disorders
KW - fluoroscopy
UR - http://www.scopus.com/inward/record.url?scp=85198366159&partnerID=8YFLogxK
U2 - 10.3390/diagnostics14131444
DO - 10.3390/diagnostics14131444
M3 - Article
AN - SCOPUS:85198366159
SN - 2075-4418
VL - 14
JO - Diagnostics
JF - Diagnostics
IS - 13
M1 - 1444
ER -