| VideoEval-Pro: Robust and Realistic Long Video Understanding Evaluation | May 20, 2025 | MMEMultiple-choice | CodeCode Available | 4 |
| Eagle 2.5: Boosting Long-Context Post-Training for Frontier Vision-Language Models | Apr 21, 2025 | MMEVideo MME | CodeCode Available | 4 |
| Long Context Transfer from Language to Vision | Jun 24, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 4 |
| Flash-VStream: Efficient Real-Time Understanding for Long Video Streams | Jun 30, 2025 | cross-modal alignmentEgoSchema | CodeCode Available | 3 |
| TimeChat-Online: 80% Visual Tokens are Naturally Redundant in Streaming Videos | Apr 24, 2025 | MMEVideo MME | CodeCode Available | 3 |
| Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition | Dec 12, 2024 | EgoSchema | CodeCode Available | 3 |
| Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension | Nov 20, 2024 | GPUMME | CodeCode Available | 3 |
| VideoDeepResearch: Long Video Understanding With Agentic Tool Using | Jun 12, 2025 | MMEVideo MME | CodeCode Available | 2 |
| SpaceR: Reinforcing MLLMs in Video Spatial Reasoning | Apr 2, 2025 | MMESpatial Reasoning | CodeCode Available | 2 |
| QuoTA: Query-oriented Token Assignment via CoT Query Decouple for Long Video Comprehension | Mar 11, 2025 | AutoMLDecoder | CodeCode Available | 2 |