| ReVision: A Dataset and Baseline VLM for Privacy-Preserving Task-Oriented Visual Instruction Rewriting | Feb 20, 2025 | Image Captioningmultimodal interaction | —Unverified | 0 |
| Interactive Sketchpad: A Multimodal Tutoring System for Collaborative, Visual Problem-Solving | Feb 12, 2025 | Mathmultimodal interaction | —Unverified | 0 |
| Towards Explainable Multimodal Depression Recognition for Clinical Interviews | Jan 27, 2025 | Decision MakingDepression Detection | CodeCode Available | 0 |
| FGU3R: Fine-Grained Fusion via Unified 3D Representation for Multimodal 3D Object Detection | Jan 8, 2025 | 3D Object DetectionAutonomous Driving | —Unverified | 0 |
| MODfinity: Unsupervised Domain Adaptation with Multimodal Information Flow Intertwining | Jan 1, 2025 | Domain AdaptationModel Selection | —Unverified | 0 |
| Computer Vision-Driven Gesture Recognition: Toward Natural and Intuitive Human-Computer | Dec 24, 2024 | Gesture Recognitionmultimodal interaction | —Unverified | 0 |
| CMATH: Cross-Modality Augmented Transformer with Hierarchical Variational Distillation for Multimodal Emotion Recognition in Conversation | Nov 15, 2024 | Emotion RecognitionEmotion Recognition in Conversation | —Unverified | 0 |
| Generative AI in Multimodal User Interfaces: Trends, Challenges, and Cross-Platform Adaptability | Nov 15, 2024 | multimodal interaction | —Unverified | 0 |
| Spider: Any-to-Many Multimodal LLM | Nov 14, 2024 | multimodal interaction | CodeCode Available | 1 |
| MIRe: Enhancing Multimodal Queries Representation via Fusion-Free Modality Interaction for Multimodal Retrieval | Nov 13, 2024 | Image ComprehensionInformation Retrieval | CodeCode Available | 0 |