The aim of this project is to propose vision and language models that support analysis and prediction of performance of the surgical team in the operating room to facilitate human-in-the-loop artificial intelligence. This PhD project will be done in the context of neurosurgical interventions, particularly for endoscopic transsphenoidal pituitary surgery (eTSS), whereby our group is curating a multimodal dataset. The specific objectives of this research include: 1) investigate audio and language models that automatically process audio into language; 20 design vision-audio-and-language models of surgical data captured in the mock operating room to assess communication of the surgical team; 3) design vision-audio-and-language models of surgical data captured in the mock operating room to process feedback articulated by experts and provide this automatedly to trainees; and 4) evaluate the performance of such models retrospectively on publicly available datasets and on a privately collection of surgical videos in the mock operating room.
Back to projects
Vision, audio, and language models for surgical interventions
