SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 551575 of 1149 papers

TitleStatusHype
VELOCITI: Benchmarking Video-Language Compositional Reasoning with Strict Entailment0
VersaVid-R1: A Versatile Video Understanding and Reasoning Model from Question Answering to Captioning Tasks0
VEU-Bench: Towards Comprehensive Understanding of Video Editing0
ViaRL: Adaptive Temporal Grounding via Visual Iterated Amplification Reinforcement Learning0
ViBe: A Text-to-Video Benchmark for Evaluating Hallucination in Large Multimodal Models0
ViCaS: A Dataset for Combining Holistic and Pixel-level Video Understanding using Captions with Grounded Segmentation0
VidCompress: Memory-Enhanced Temporal Compression for Video Understanding in Large Language Models0
Video4MRI: An Empirical Study on Brain Magnetic Resonance Image Analytics with CNN-based Video Classification Frameworks0
VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding0
VideoAutoArena: An Automated Arena for Evaluating Large Multimodal Models in Video Analysis through User Simulation0
Perceptron Synthesis Network: Rethinking the Action Scale Variances in Videos0
Video Domain Incremental Learning for Human Action Recognition in Home Environments0
Video Event Reasoning and Prediction by Fusing World Knowledge from LLMs with Vision Foundation Models0
VideoExpert: Augmented LLM for Temporal-Sensitive Video Understanding0
VideoGLUE: Video General Understanding Evaluation of Foundation Models0
Video-GroundingDINO: Towards Open-Vocabulary Spatio-Temporal Video Grounding0
VideoGrounding-DINO: Towards Open-Vocabulary Spatio-Temporal Video Grounding0
VideoHallucer: Evaluating Intrinsic and Extrinsic Hallucinations in Large Video-Language Models0
VideoITG: Multimodal Video Understanding with Instructed Temporal Grounding0
Video Language Model Pretraining with Spatio-temporal Masking0
VideoLLaMB: Long-context Video Understanding with Recurrent Memory Bridges0
VideoLLM Benchmarks and Evaluation: A Survey0
VideoMCC: a New Benchmark for Video Comprehension0
Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition0
VideoPrism: A Foundational Visual Encoder for Video Understanding0
Show:102550
← PrevPage 23 of 46Next →

No leaderboard results yet.