Abstract: In audio classification, developing efficient and robust models is critical for real-time applications. Inspired by the design principles of MobileViT, we present FAST (Fast Audio ...
Abstract: This study proposes a novel multimodal deep learning framework for depression detection, integrating visual, audio, and textual data. Using OpenFace and Librosa for feature extraction, the ...