SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 951960 of 1149 papers

TitleStatusHype
VELOCITI: Benchmarking Video-Language Compositional Reasoning with Strict Entailment0
VersaVid-R1: A Versatile Video Understanding and Reasoning Model from Question Answering to Captioning Tasks0
VEU-Bench: Towards Comprehensive Understanding of Video Editing0
ViaRL: Adaptive Temporal Grounding via Visual Iterated Amplification Reinforcement Learning0
ViBe: A Text-to-Video Benchmark for Evaluating Hallucination in Large Multimodal Models0
ViCaS: A Dataset for Combining Holistic and Pixel-level Video Understanding using Captions with Grounded Segmentation0
VidCompress: Memory-Enhanced Temporal Compression for Video Understanding in Large Language Models0
Video4MRI: An Empirical Study on Brain Magnetic Resonance Image Analytics with CNN-based Video Classification Frameworks0
VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding0
VideoAutoArena: An Automated Arena for Evaluating Large Multimodal Models in Video Analysis through User Simulation0
Show:102550
← PrevPage 96 of 115Next →

No leaderboard results yet.