UPDF AI

Using Vision Transformers for Classifying Surgical Tools in Computer Aided Surgeries

Hisham El Moaqet,Rami Janini,2 作者,Knut Möller

2024 · DOI: 10.1515/cdbme-2024-2056
Current Directions in Biomedical Engineering · 引用数 2

TLDR

The use of pure self-attention based models-Vision Transformers for classifying both single-label (SL) and multi-label (ML) frames in Laparoscopic surgeries shows an excellent classification performance with a mean average precision mAP=95.8% that outperforms conventional deep learning multi-label models developed in previous studies.

摘要

Abstract Automated laparoscopic video analysis is essential for assisting surgeons during computer aided medical procedures. Nevertheless, it faces challenges due to complex surgical scenes and limited annotated data. Most of the existing methods for classifying surgical tools in laparoscopic surgeries rely on conventional deep learning methods such as convolutional and recurrent neural networks. This paper explores the use of pure self-attention based models-Vision Transformers for classifying both single-label (SL) and multi-label (ML) frames in Laparoscopic surgeries. The proposed SL and ML models were comprehensively evaluated on the Cholec80 surgical workflow dataset using 5-fold cross validation. Experimental results showed an excellent classification performance with a mean average precision mAP=95.8% that outperforms conventional deep learning multi-label models developed in previous studies. Our results open new avenues for further research on the use of deep transformer models for surgical tool detection in modern operating theaters.

参考文献
引用文献