SOTAVerified|Agents Browse Leaderboard About

MVBench

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–19 of 19 papers

Title	Date	Tasks	Status	Hype
CogVLM2: Visual Language Models for Image and Video Understanding	Aug 29, 2024	MM-VetMVBench	CodeCode Available	9
Video-R1: Reinforcing Video Reasoning in MLLMs	Mar 27, 2025	MVBenchReinforcement Learning (RL)	CodeCode Available	4
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning	Apr 25, 2024	Dense CaptioningMVBench	CodeCode Available	4
Flash-VStream: Efficient Real-Time Understanding for Long Video Streams	Jun 30, 2025	cross-modal alignmentEgoSchema	CodeCode Available	3
VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning	Apr 9, 2025	MVBenchObject Tracking	CodeCode Available	3
Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition	Dec 12, 2024	EgoSchema	CodeCode Available	3
VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding	Jun 13, 2024	Dense Video CaptioningMVBench	CodeCode Available	3
Mobile-VideoGPT: Fast and Accurate Video Understanding Language Model	Mar 27, 2025	EgoSchemaLanguage Modeling	CodeCode Available	2
LLaVAction: evaluating and training multi-modal large language models for action recognition	Mar 24, 2025	Action RecognitionAction Understanding	CodeCode Available	2
Video-CCAM: Enhancing Video-Language Understanding with Causal Cross-Attention Masks for Short and Long Videos	Aug 26, 2024	Large Language ModelMVBench	CodeCode Available	2
ST-LLM: Large Language Models Are Effective Temporal Learners	Mar 30, 2024	MVBenchReading Comprehension	CodeCode Available	2
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark	Nov 28, 2023	3D Question Answering (3D-QA)Diagnostic	CodeCode Available	2
VideoHallu: Evaluating and Mitigating Multi-modal Hallucinations on Synthetic Video Understanding	May 2, 2025	Anomaly DetectionCommon Sense Reasoning	CodeCode Available	1
TS-LLaVA: Constructing Visual Tokens through Thumbnail-and-Sampling for Training-Free Video Large Language Models	Nov 17, 2024	MVBenchVideo-based Generative Performance Benchmarking	CodeCode Available	1
GAM-Agent: Game-Theoretic and Uncertainty-Aware Collaboration for Complex Visual Reasoning	May 29, 2025	Multimodal ReasoningMVBench	—Unverified	0
VideoPASTA: 7K Preference Pairs That Matter for Video-LLM Alignment	Apr 18, 2025	MVBench	—Unverified	0
VideoSAVi: Self-Aligned Video Language Models without Human Supervision	Dec 1, 2024	EgoSchemaMVBench	—Unverified	0
Enhancing Temporal Modeling of Video LLMs via Time Gating	Oct 8, 2024	MVBenchQuestion Answering	CodeCode Available	0
VideoLLaMB: Long-context Video Understanding with Recurrent Memory Bridges	Sep 2, 2024	GPUMVBench	—Unverified	0

Show:10 25 50

No leaderboard results yet.