| Audio-Visual Semantic Graph Network for Audio-Visual Event Localization | Jan 1, 2025 | audio-visual event localizationcross-modal alignment | —Unverified | 0 |
| Generalized Zero-Shot Classification via Semantics-Free Inter-Class Feature Generation | Jan 1, 2025 | Classificationcross-modal alignment | —Unverified | 0 |
| Chat-based Person Retrieval via Dialogue-Refined Cross-Modal Alignment | Jan 1, 2025 | Attributecross-modal alignment | —Unverified | 0 |
| Enhancing Multimodal Emotion Recognition through Multi-Granularity Cross-Modal Alignment | Dec 30, 2024 | cross-modal alignmentEmotion Recognition | —Unverified | 0 |
| ChartAdapter: Large Vision-Language Model for Chart Summarization | Dec 30, 2024 | Chart Understandingcross-modal alignment | —Unverified | 0 |
| Enhancing Visual Representation for Text-based Person Searching | Dec 30, 2024 | cross-modal alignmentPerson Search | CodeCode Available | 0 |
| Bag of Tricks for Multimodal AutoML with Image, Text, and Tabular Data | Dec 19, 2024 | AutoMLcross-modal alignment | —Unverified | 0 |
| RAC3: Retrieval-Augmented Corner Case Comprehension for Autonomous Driving with Vision-Language Models | Dec 15, 2024 | Autonomous DrivingContrastive Learning | —Unverified | 0 |
| Wearable Accelerometer Foundation Models for Health via Knowledge Distillation | Dec 15, 2024 | Activity Recognitioncross-modal alignment | —Unverified | 0 |
| Dynamic Cross-Modal Alignment for Robust Semantic Location Prediction | Dec 13, 2024 | cross-modal alignmentPrediction | —Unverified | 0 |