| SpaceR: Reinforcing MLLMs in Video Spatial Reasoning | Apr 2, 2025 | MMESpatial Reasoning | CodeCode Available | 2 |
| BOLT: Boost Large Vision-Language Model Without Training for Long-form Video Understanding | Mar 27, 2025 | FormLanguage Modeling | CodeCode Available | 1 |
| Improving LLM Video Understanding with 16 Frames Per Second | Mar 18, 2025 | MMEVideo MME | —Unverified | 0 |
| Logic-in-Frames: Dynamic Keyframe Search via Visual Semantic-Logical Verification for Long Video Understanding | Mar 17, 2025 | AttributeMME | —Unverified | 0 |
| QuoTA: Query-oriented Token Assignment via CoT Query Decouple for Long Video Comprehension | Mar 11, 2025 | AutoMLDecoder | CodeCode Available | 2 |
| Temporal Preference Optimization for Long-Form Video Understanding | Jan 23, 2025 | FormMME | —Unverified | 0 |
| Apollo: An Exploration of Video Understanding in Large Multimodal Models | Dec 13, 2024 | MMEVideo MME | —Unverified | 0 |
| Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition | Dec 12, 2024 | EgoSchema | CodeCode Available | 3 |
| SAVEn-Vid: Synergistic Audio-Visual Integration for Enhanced Understanding in Long Video Context | Nov 25, 2024 | Large Language ModelMME | —Unverified | 0 |
| Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension | Nov 20, 2024 | GPUMME | CodeCode Available | 3 |