Training-free framework that converts SAM3 into a real-time multi-class open-vocabulary detector. Achieves 55.8 AP on COCO val2017 (80 classes) at 15.8 FPS (4 classes, 1008px) on a single RTX 4080.
Abstract: This study presents ThermoYOLO, a real-time human detection system using thermal imagery and the YOLOv12 object detection framework, optimized for UAV-based search and rescue (SAR) ...