SOTAVerified|Agents Browse Leaderboard About

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 551–575 of 1149 papers

Title	Date	Tasks	Status
VELOCITI: Benchmarking Video-Language Compositional Reasoning with Strict Entailment	Jun 16, 2024	Action UnderstandingBenchmarking	—Unverified
VersaVid-R1: A Versatile Video Understanding and Reasoning Model from Question Answering to Captioning Tasks	Jun 10, 2025	Multiple-choiceOpen-Ended Question Answering	—Unverified
VEU-Bench: Towards Comprehensive Understanding of Video Editing	Jan 1, 2025	Video EditingVideo Understanding	—Unverified
ViaRL: Adaptive Temporal Grounding via Visual Iterated Amplification Reinforcement Learning	May 21, 2025	Pseudo LabelReinforcement Learning (RL)	—Unverified
ViBe: A Text-to-Video Benchmark for Evaluating Hallucination in Large Multimodal Models	Nov 16, 2024	HallucinationVideo Generation	—Unverified
ViCaS: A Dataset for Combining Holistic and Pixel-level Video Understanding using Captions with Grounded Segmentation	Dec 12, 2024	Phrase GroundingQuestion Answering	—Unverified
VidCompress: Memory-Enhanced Temporal Compression for Video Understanding in Large Language Models	Oct 15, 2024	Video Understanding	—Unverified
Video4MRI: An Empirical Study on Brain Magnetic Resonance Image Analytics with CNN-based Video Classification Frameworks	Feb 24, 2023	ClassificationData Augmentation	—Unverified
VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding	Mar 18, 2024	EgoSchemaVideo Understanding	—Unverified
VideoAutoArena: An Automated Arena for Evaluating Large Multimodal Models in Video Analysis through User Simulation	Nov 20, 2024	ChatbotMultiple-choice	—Unverified
Perceptron Synthesis Network: Rethinking the Action Scale Variances in Videos	Jul 22, 2020	Action RecognitionTemporal Action Localization	—Unverified
Video Domain Incremental Learning for Human Action Recognition in Home Environments	Dec 22, 2024	Action Recognitionclass-incremental learning	—Unverified
Video Event Reasoning and Prediction by Fusing World Knowledge from LLMs with Vision Foundation Models	Jul 8, 2025	Future predictionLarge Language Model	—Unverified
VideoExpert: Augmented LLM for Temporal-Sensitive Video Understanding	Apr 10, 2025	Instruction FollowingVideo Understanding	—Unverified
VideoGLUE: Video General Understanding Evaluation of Foundation Models	Jul 6, 2023	Action RecognitionTemporal Localization	—Unverified
Video-GroundingDINO: Towards Open-Vocabulary Spatio-Temporal Video Grounding	Dec 31, 2023	Spatio-Temporal Video GroundingVideo Grounding	—Unverified
VideoGrounding-DINO: Towards Open-Vocabulary Spatio-Temporal Video Grounding	Jan 1, 2024	Spatio-Temporal Video GroundingVideo Grounding	—Unverified
VideoHallucer: Evaluating Intrinsic and Extrinsic Hallucinations in Large Video-Language Models	Jun 24, 2024	HallucinationVideo Understanding	—Unverified
VideoITG: Multimodal Video Understanding with Instructed Temporal Grounding	Jul 17, 2025	Video GroundingVideo Understanding	—Unverified
Video Language Model Pretraining with Spatio-temporal Masking	Jan 1, 2025	DecoderLanguage Modeling	—Unverified
VideoLLaMB: Long-context Video Understanding with Recurrent Memory Bridges	Sep 2, 2024	GPUMVBench	—Unverified
VideoLLM Benchmarks and Evaluation: A Survey	May 3, 2025	SurveyVideo Understanding	—Unverified
VideoMCC: a New Benchmark for Video Comprehension	Jun 23, 2016	Multiple-choiceVideo Description	—Unverified
Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition	May 7, 2024	Large Language ModelMultimodal Large Language Model	—Unverified
VideoPrism: A Foundational Visual Encoder for Video Understanding	Feb 20, 2024	Question AnsweringVideo Question Answering	—Unverified

Show:10 25 50

← PrevPage 23 of 46Next →

No leaderboard results yet.