Abstract
Surgical phase recognition is challenging due to overfitting problems caused by imbalanced data among surgical phases. We proposed an adaptive sampling rate-based undersampling method that could generate the number of each surgical phase data similarly to alleviate biased learning. To improve the performance of our method, we also introduced a two-stream CNN-LSTM model that could extract temporal information on behavioral changes between each image frame. First, we extracted a total of 40,236 short clips using an adaptive subsampling rate from the entire video. Each short clip was entered into a pre-trained GoogLeNet. The output with visual information was then immediately fed into a sequence-to-sequence LSTM model to extract temporal information of neighbor frames within a short clip. At the same time, another sequence-to-vector LSTM was used, to extract temporal information from all successive image frames to predict the final surgical phase. The proposed method was evaluated with a public dataset Cholec80. The proposed approach outperformed state-of-the-art methods, showing a high F1-score of 87.12% and an AUC of 98.00%. In addition, the F1-score deviation between all phases decreased by about 10% compared to that before applying undersampling. Experimental results confirmed that employing our proposed method could learn enrich temporal information from short clips. It outperformed the conventional one-stream CNN-LSTM architecture.
Original language | English |
---|---|
Article number | 105637 |
Journal | Biomedical Signal Processing and Control |
Volume | 88 |
DOIs | |
State | Published - Feb 2024 |
Keywords
- Automated surgical phase recognition
- Cholecystectomy
- Endoscopic video
- Short-clip-based
- Two-stream CNN-LSTMs
- Undersampling