SOTAVerified|Agents Browse Leaderboard About

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 476–500 of 1149 papers

Title	Date	Tasks	Status
ResNetVLLM -- Multi-modal Vision LLM for the Video Understanding Task	Apr 20, 2025	Language ModelingLanguage Modelling	—Unverified
OmniV-Med: Scaling Medical Vision-Language Model for Universal Visual Understanding	Apr 20, 2025	Language ModelingLanguage Modelling	—Unverified
How Well Can General Vision-Language Models Learn Medicine By Watching Public Educational Videos?	Apr 19, 2025	Video Understanding	—Unverified
Prototypes are Balanced Units for Efficient and Effective Partially Relevant Video Retrieval	Apr 17, 2025	Partially Relevant Video RetrievalRetrieval	—Unverified
Self-alignment of Large Video Language Models with Refined Regularized Preference Optimization	Apr 16, 2025	HallucinationQuestion Answering	—Unverified
PVUW 2025 Challenge Report: Advances in Pixel-level Understanding of Complex Videos in the Wild	Apr 15, 2025	SegmentationSemantic Segmentation	—Unverified
OmniVDiff: Omni Controllable Video Diffusion for Generation and Understanding	Apr 15, 2025	Semantic SegmentationVideo Generation	—Unverified
Mavors: Multi-granularity Video Representation for Multimodal Large Language Model	Apr 14, 2025	Computational EfficiencyLanguage Modeling	—Unverified
Towards Efficient and Robust Moment Retrieval System: A Unified Framework for Multi-Granularity Models and Temporal Reranking	Apr 11, 2025	Moment RetrievalQuestion Answering	—Unverified
SF2T: Self-supervised Fragment Finetuning of Video-LLMs for Fine-Grained Understanding	Apr 10, 2025	Video Understanding	—Unverified
How Can Objects Help Video-Language Understanding?	Apr 10, 2025	Image CaptioningObject	—Unverified
VideoExpert: Augmented LLM for Temporal-Sensitive Video Understanding	Apr 10, 2025	Instruction FollowingVideo Understanding	—Unverified
From Broadcast to Minimap: Achieving State-of-the-Art SoccerNet Game State Reconstruction	Apr 8, 2025	Game State ReconstructionJersey Number Recognition	—Unverified
From 128K to 4M: Efficient Training of Ultra-Long Context Large Language Models	Apr 8, 2025	In-Context LearningInstruction Following	—Unverified
InstructionBench: An Instructional Video Understanding Benchmark	Apr 7, 2025	Common Sense ReasoningMultiple-choice	—Unverified
Learning Audio-guided Video Representation with Gated Attention for Video-Text Retrieval	Apr 3, 2025	Information RetrievalRepresentation Learning	—Unverified
Moment Quantization for Video Temporal Grounding	Apr 3, 2025	QuantizationVideo Understanding	—Unverified
TimeSearch: Hierarchical Video Search with Spotlight and Reflection for Human-like Long Video Understanding	Apr 2, 2025	Video Understanding	—Unverified
Is Temporal Prompting All We Need For Limited Labeled Action Recognition?	Apr 2, 2025	Action RecognitionAll	—Unverified
Aligned Better, Listen Better for Audio-Visual Large Language Models	Apr 2, 2025	Video Understanding	—Unverified
DANTE-AD: Dual-Vision Attention Network for Long-Term Audio Description	Mar 31, 2025	Video DescriptionVideo Understanding	—Unverified
H2VU-Benchmark: A Comprehensive Benchmark for Hierarchical Holistic Video Understanding	Mar 31, 2025	Video Understanding	—Unverified
CA^2ST: Cross-Attention in Audio, Space, and Time for Holistic Video Recognition	Mar 30, 2025	Action ClassificationAction Recognition	—Unverified
OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contexts	Mar 29, 2025	Streaming video understandingVideo Understanding	—Unverified
Self-ReS: Self-Reflection in Large Vision-Language Models for Long Video Understanding	Mar 26, 2025	GPUQuestion Answering	—Unverified

Show:10 25 50

← PrevPage 20 of 46Next →

No leaderboard results yet.