Abstract: For the knowledge graph to text(KG-to-text) generation task, recent works have attempted to incorporate graph structure information into pre-trained ...
Abstract: Multimodal emotion recognition (MER), leveraging speech and text, has emerged as a pivotal domain within human-computer interaction, demanding sophisticated methods for effective multimodal ...
Boosted by Multi-modal Large Language Models (MLLMs), text-guided universal segmentation models for the image and video domains have made rapid progress recently. However, these methods are often ...