SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 1120 of 1149 papers

TitleStatusHype
CAVALRY-V: A Large-Scale Generator Framework for Adversarial Attacks on Video MLLMs0
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement LearningCode7
Flash-VStream: Efficient Real-Time Understanding for Long Video StreamsCode3
ActAlign: Zero-Shot Fine-Grained Video Classification via Language-Guided Sequence AlignmentCode0
Q-Frame: Query-aware Frame Selection and Multi-Resolution Adaptation for Video-LLMs0
LLaVA-Scissor: Token Compression with Semantic Connected Components for Video LLMsCode2
Task-Aware KV Compression For Cost-Effective Long Video UnderstandingCode0
IPFormer-VideoLLM: Enhancing Multi-modal Video Understanding for Multi-shot Scenes0
PEVLM: Parallel Encoding for Vision-Language Models0
GRPO-CARE: Consistency-Aware Reinforcement Learning for Multimodal Reasoning0
Show:102550
← PrevPage 2 of 115Next →

No leaderboard results yet.