| Let Me Finish My Sentence: Video Temporal Grounding with Holistic Text Understanding | Oct 17, 2024 | cross-modal alignmentSentence | —Unverified | 0 |
| Modeling the Human Visual System: Comparative Insights from Response-Optimized and Task-Optimized Vision Models, Language Models, and different Readout Mechanisms | Oct 17, 2024 | cross-modal alignmentLarge Language Model | —Unverified | 0 |
| OMCAT: Omni Context Aware Transformer | Oct 15, 2024 | Audio-visual Question AnsweringAudio-Visual Question Answering (AVQA) | —Unverified | 0 |
| Evaluating Semantic Variation in Text-to-Image Synthesis: A Causal Perspective | Oct 14, 2024 | cross-modal alignmentImage Generation | CodeCode Available | 0 |
| EMMA: Empowering Multi-modal Mamba with Structural and Hierarchical Alignment | Oct 8, 2024 | cross-modal alignmentHallucination | —Unverified | 0 |
| Intriguing Properties of Large Language and Vision Models | Oct 7, 2024 | cross-modal alignmentLarge Language Model | —Unverified | 0 |
| TANGO: Co-Speech Gesture Video Reenactment with Hierarchical Audio Motion Embedding and Diffusion Interpolation | Oct 5, 2024 | cross-modal alignmentRetrieval | —Unverified | 0 |
| Fully Aligned Network for Referring Image Segmentation | Sep 29, 2024 | cross-modal alignmentDecoder | —Unverified | 0 |
| Exploring Information-Theoretic Metrics Associated with Neural Collapse in Supervised Training | Sep 25, 2024 | Classificationcross-modal alignment | —Unverified | 0 |
| TS-HTFA: Advancing Time Series Forecasting via Hierarchical Text-Free Alignment with Large Language Models | Sep 23, 2024 | Contrastive Learningcross-modal alignment | —Unverified | 0 |