SOTAVerified|Agents Browse Leaderboard About

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 426–450 of 1149 papers

Title	Date	Tasks	Status
DualX-VSR: Dual Axial SpatialTemporal Transformer for Real-World Video Super-Resolution without Motion Compensation	Jun 5, 2025	Motion CompensationOptical Flow Estimation	—Unverified
AV-Reasoner: Improving and Benchmarking Clue-Grounded Audio-Visual Counting for MLLMs	Jun 5, 2025	BenchmarkingVideo Understanding	—Unverified
APVR: Hour-Level Long Video Understanding with Adaptive Pivot Visual Information Retrieval	Jun 5, 2025	Information RetrievalRetrieval	—Unverified
TextVidBench: A Benchmark for Long Video Scene Text Understanding	Jun 5, 2025	Prompt EngineeringQuestion Answering	—Unverified
DynTok: Dynamic Compression of Visual Tokens for Efficient and Effective Video Understanding	Jun 4, 2025	MMEVideo MME	—Unverified
METok: Multi-Stage Event-based Token Compression for Efficient Long Video Understanding	Jun 3, 2025	Video Understanding	CodeCode Available
InterRVOS: Interaction-aware Referring Video Object Segmentation	Jun 3, 2025	8kObject	—Unverified
EgoVLM: Policy Optimization for Egocentric Video Understanding	Jun 3, 2025	EgoSchemaQuestion Answering	CodeCode Available
ReAgent-V: A Reward-Driven Multi-Agent Framework for Video Understanding	Jun 2, 2025	Action RecognitionVideo Understanding	—Unverified
FlexSelect: Flexible Token Selection for Efficient Long Video Understanding	Jun 1, 2025	Video Understanding	—Unverified
Scene Detection Policies and Keyframe Extraction Strategies for Large-Scale Video Analysis	May 31, 2025	Scene SegmentationSegmentation	—Unverified
Threading Keyframe with Narratives: MLLMs as Strong Long Video Comprehenders	May 30, 2025	Video Understanding	—Unverified
Learning reusable concepts across different egocentric video understanding tasks	May 30, 2025	Video Understanding	—Unverified
VUDG: A Dataset for Video Understanding Domain Generalization	May 30, 2025	Domain GeneralizationMultiple-choice	—Unverified
Time Blindness: Why Video-Language Models Can't See What Humans Can?	May 30, 2025	Temporal SequencesVideo Understanding	—Unverified
ScaleLong: A Multi-Timescale Benchmark for Long Video Understanding	May 29, 2025	AvgVideo Understanding	CodeCode Available
MaCP: Minimal yet Mighty Adaptation via Hierarchical Cosine Projection	May 29, 2025	image-classificationImage Classification	—Unverified
Multi-RAG: A Multimodal Retrieval-Augmented Generation System for Adaptive Video Understanding	May 29, 2025	RAGRetrieval-augmented Generation	—Unverified
Universal Visuo-Tactile Video Understanding for Embodied Interaction	May 28, 2025	FrictionLarge Language Model	—Unverified
Two Causally Related Needles in a Video Haystack	May 26, 2025	Video UnderstandingVisual Grounding	—Unverified
TUNA: Comprehensive Fine-grained Temporal Understanding Evaluation on Dense Dynamic Videos	May 26, 2025	AttributeVideo Understanding	CodeCode Available
AdaTP: Attention-Debiased Token Pruning for Video Large Language Models	May 26, 2025	Video Understanding	—Unverified
Sparse-to-Dense: A Free Lunch for Lossless Acceleration of Video Understanding in LLMs	May 25, 2025	Video Understanding	—Unverified
Deep Video Discovery: Agentic Search with Tool Use for Long-form Video Understanding	May 23, 2025	FormQuestion Answering	—Unverified
SoccerChat: Integrating Multimodal Data for Enhanced Soccer Game Understanding	May 22, 2025	Action ClassificationAutomatic Speech Recognition	CodeCode Available

Show:10 25 50

← PrevPage 18 of 46Next →

No leaderboard results yet.