SOTAVerified|Agents Browse Leaderboard About Blog

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 11–20 of 1149 papers

Title	Date	Tasks	Status	Hype
CAVALRY-V: A Large-Scale Generator Framework for Adversarial Attacks on Video MLLMs	Jul 1, 2025	Text GenerationVideo Understanding	—Unverified	0
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning	Jul 1, 2025	document understandingMultimodal Reasoning	CodeCode Available	7
Flash-VStream: Efficient Real-Time Understanding for Long Video Streams	Jun 30, 2025	cross-modal alignmentEgoSchema	CodeCode Available	3
ActAlign: Zero-Shot Fine-Grained Video Classification via Language-Guided Sequence Alignment	Jun 28, 2025	Dynamic Time WarpingLarge Language Model	CodeCode Available	0
Q-Frame: Query-aware Frame Selection and Multi-Resolution Adaptation for Video-LLMs	Jun 27, 2025	MMEVideo MME	—Unverified	0
LLaVA-Scissor: Token Compression with Semantic Connected Components for Video LLMs	Jun 27, 2025	Question AnsweringVideo Question Answering	CodeCode Available	2
Task-Aware KV Compression For Cost-Effective Long Video Understanding	Jun 26, 2025	Video Understanding	CodeCode Available	0
IPFormer-VideoLLM: Enhancing Multi-modal Video Understanding for Multi-shot Scenes	Jun 26, 2025	AttributeQuestion Answering	—Unverified	0
PEVLM: Parallel Encoding for Vision-Language Models	Jun 24, 2025	Autonomous DrivingVideo Understanding	—Unverified	0
GRPO-CARE: Consistency-Aware Reinforcement Learning for Multimodal Reasoning	Jun 19, 2025	Multimodal Reasoningreinforcement-learning	—Unverified	0

Show:10 25 50

← PrevPage 2 of 115Next →

No leaderboard results yet.