| VideoDeepResearch: Long Video Understanding With Agentic Tool Using | Jun 12, 2025 | MMEVideo MME | CodeCode Available | 2 |
| FRAG: Frame Selection Augmented Generation for Long Video and Long Document Understanding | Apr 24, 2025 | document understandingMME | CodeCode Available | 1 |
| BOLT: Boost Large Vision-Language Model Without Training for Long-form Video Understanding | Mar 27, 2025 | FormLanguage Modeling | CodeCode Available | 1 |
| Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis | May 31, 2024 | MMEVideo MME | CodeCode Available | 1 |
| SiLVR: A Simple Language-based Video Reasoning Framework | May 30, 2025 | MathMME | CodeCode Available | 1 |
| ZipVL: Efficient Large Vision-Language Models with Dynamic Token Sparsification | Oct 11, 2024 | MMEQuantization | —Unverified | 0 |
| Apollo: An Exploration of Video Understanding in Large Multimodal Models | Dec 13, 2024 | MMEVideo MME | —Unverified | 0 |
| DrVideo: Document Retrieval Based Long Video Understanding | Jun 18, 2024 | document understandingEgoSchema | —Unverified | 0 |
| DynTok: Dynamic Compression of Visual Tokens for Efficient and Effective Video Understanding | Jun 4, 2025 | MMEVideo MME | —Unverified | 0 |
| Improving LLM Video Understanding with 16 Frames Per Second | Mar 18, 2025 | MMEVideo MME | —Unverified | 0 |