SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 411420 of 1149 papers

TitleStatusHype
Task-Aware KV Compression For Cost-Effective Long Video UnderstandingCode0
PEVLM: Parallel Encoding for Vision-Language Models0
GRPO-CARE: Consistency-Aware Reinforcement Learning for Multimodal Reasoning0
InfiniPot-V: Memory-Constrained KV Cache Compression for Streaming Video Understanding0
EVA02-AT: Egocentric Video-Language Understanding with Spatial-Temporal Rotary Positional Embeddings and Symmetric OptimizationCode0
AdaVideoRAG: Omni-Contextual Adaptive Retrieval-Augmented Efficient Long Video UnderstandingCode0
MambaMia: A State-Space-Model-Based Compression for Efficient Video Understanding in Large Multimodal Models0
HopaDIFF: Holistic-Partial Aware Fourier Conditioned Diffusion for Referring Human Action Segmentation in Multi-Person ScenariosCode0
VersaVid-R1: A Versatile Video Understanding and Reasoning Model from Question Answering to Captioning Tasks0
MLVTG: Mamba-Based Feature Alignment and LLM-Driven Purification for Multi-Modal Video Temporal Grounding0
Show:102550
← PrevPage 42 of 115Next →

No leaderboard results yet.