SOTAVerified

Video MME

Papers

Showing 126 of 26 papers

TitleStatusHype
VideoEval-Pro: Robust and Realistic Long Video Understanding EvaluationCode4
Eagle 2.5: Boosting Long-Context Post-Training for Frontier Vision-Language ModelsCode4
Long Context Transfer from Language to VisionCode4
Flash-VStream: Efficient Real-Time Understanding for Long Video StreamsCode3
TimeChat-Online: 80% Visual Tokens are Naturally Redundant in Streaming VideosCode3
Lyra: An Efficient and Speech-Centric Framework for Omni-CognitionCode3
Video-RAG: Visually-aligned Retrieval-Augmented Long Video ComprehensionCode3
VideoDeepResearch: Long Video Understanding With Agentic Tool UsingCode2
SpaceR: Reinforcing MLLMs in Video Spatial ReasoningCode2
QuoTA: Query-oriented Token Assignment via CoT Query Decouple for Long Video ComprehensionCode2
VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long VideosCode2
SiLVR: A Simple Language-based Video Reasoning FrameworkCode1
FRAG: Frame Selection Augmented Generation for Long Video and Long Document UnderstandingCode1
BOLT: Boost Large Vision-Language Model Without Training for Long-form Video UnderstandingCode1
Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video AnalysisCode1
Q-Frame: Query-aware Frame Selection and Multi-Resolution Adaptation for Video-LLMs0
DynTok: Dynamic Compression of Visual Tokens for Efficient and Effective Video Understanding0
An LMM for Efficient Video Understanding via Reinforced Compression of Video Cubes0
Improving LLM Video Understanding with 16 Frames Per Second0
Logic-in-Frames: Dynamic Keyframe Search via Visual Semantic-Logical Verification for Long Video Understanding0
Temporal Preference Optimization for Long-Form Video Understanding0
Apollo: An Exploration of Video Understanding in Large Multimodal Models0
SAVEn-Vid: Synergistic Audio-Visual Integration for Enhanced Understanding in Long Video Context0
ZipVL: Efficient Large Vision-Language Models with Dynamic Token Sparsification0
Temporal Reasoning Transfer from Text to Video0
DrVideo: Document Retrieval Based Long Video Understanding0
Show:102550

No leaderboard results yet.