| VideoEval-Pro: Robust and Realistic Long Video Understanding Evaluation | May 20, 2025 | MMEMultiple-choice | CodeCode Available | 4 |
| Eagle 2.5: Boosting Long-Context Post-Training for Frontier Vision-Language Models | Apr 21, 2025 | MMEVideo MME | CodeCode Available | 4 |
| Long Context Transfer from Language to Vision | Jun 24, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 4 |
| Flash-VStream: Efficient Real-Time Understanding for Long Video Streams | Jun 30, 2025 | cross-modal alignmentEgoSchema | CodeCode Available | 3 |
| TimeChat-Online: 80% Visual Tokens are Naturally Redundant in Streaming Videos | Apr 24, 2025 | MMEVideo MME | CodeCode Available | 3 |
| Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition | Dec 12, 2024 | EgoSchema | CodeCode Available | 3 |
| Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension | Nov 20, 2024 | GPUMME | CodeCode Available | 3 |
| VideoDeepResearch: Long Video Understanding With Agentic Tool Using | Jun 12, 2025 | MMEVideo MME | CodeCode Available | 2 |
| SpaceR: Reinforcing MLLMs in Video Spatial Reasoning | Apr 2, 2025 | MMESpatial Reasoning | CodeCode Available | 2 |
| QuoTA: Query-oriented Token Assignment via CoT Query Decouple for Long Video Comprehension | Mar 11, 2025 | AutoMLDecoder | CodeCode Available | 2 |
| VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos | May 29, 2024 | EgoSchemaMME | CodeCode Available | 2 |
| SiLVR: A Simple Language-based Video Reasoning Framework | May 30, 2025 | MathMME | CodeCode Available | 1 |
| FRAG: Frame Selection Augmented Generation for Long Video and Long Document Understanding | Apr 24, 2025 | document understandingMME | CodeCode Available | 1 |
| BOLT: Boost Large Vision-Language Model Without Training for Long-form Video Understanding | Mar 27, 2025 | FormLanguage Modeling | CodeCode Available | 1 |
| Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis | May 31, 2024 | MMEVideo MME | CodeCode Available | 1 |
| Q-Frame: Query-aware Frame Selection and Multi-Resolution Adaptation for Video-LLMs | Jun 27, 2025 | MMEVideo MME | —Unverified | 0 |
| DynTok: Dynamic Compression of Visual Tokens for Efficient and Effective Video Understanding | Jun 4, 2025 | MMEVideo MME | —Unverified | 0 |
| An LMM for Efficient Video Understanding via Reinforced Compression of Video Cubes | Apr 21, 2025 | MMEVideo MME | —Unverified | 0 |
| Improving LLM Video Understanding with 16 Frames Per Second | Mar 18, 2025 | MMEVideo MME | —Unverified | 0 |
| Logic-in-Frames: Dynamic Keyframe Search via Visual Semantic-Logical Verification for Long Video Understanding | Mar 17, 2025 | AttributeMME | —Unverified | 0 |
| Temporal Preference Optimization for Long-Form Video Understanding | Jan 23, 2025 | FormMME | —Unverified | 0 |
| Apollo: An Exploration of Video Understanding in Large Multimodal Models | Dec 13, 2024 | MMEVideo MME | —Unverified | 0 |
| SAVEn-Vid: Synergistic Audio-Visual Integration for Enhanced Understanding in Long Video Context | Nov 25, 2024 | Large Language ModelMME | —Unverified | 0 |
| ZipVL: Efficient Large Vision-Language Models with Dynamic Token Sparsification | Oct 11, 2024 | MMEQuantization | —Unverified | 0 |
| Temporal Reasoning Transfer from Text to Video | Oct 8, 2024 | DiagnosticMME | —Unverified | 0 |
| DrVideo: Document Retrieval Based Long Video Understanding | Jun 18, 2024 | document understandingEgoSchema | —Unverified | 0 |