SOTAVerified|Agents Browse Leaderboard About

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 676–700 of 1149 papers

Title	Date	Tasks	Status
Augmented Transformer with Adaptive Graph for Temporal Action Proposal Generation	Mar 30, 2021	Action DetectionTemporal Action Proposal Generation	—Unverified
A Unified Framework for Human-centric Point Cloud Video Understanding	Mar 29, 2024	3D Pose EstimationAction Recognition	—Unverified
A Unified Model for Video Understanding and Knowledge Embedding with Heterogeneous Knowledge Graph Dataset	Nov 19, 2022	Common Sense ReasoningGraph Embedding	—Unverified
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark	Oct 4, 2024	Image CaptioningVideo Understanding	—Unverified
Auto-captions on GIF: A Large-scale Video-sentence Dataset for Vision-language Pre-training	Jul 5, 2020	DecoderQuestion Answering	—Unverified
Auto-X3D: Ultra-Efficient Video Understanding via Finer-Grained Neural Architecture Search	Dec 9, 2021	Neural Architecture SearchVideo Recognition	—Unverified
AVD2: Accident Video Diffusion for Accident Video Description	Feb 20, 2025	Autonomous DrivingScene Understanding	—Unverified
Empowering LLMs with Pseudo-Untrimmed Videos for Audio-Visual Temporal Understanding	Mar 24, 2024	Dense Video CaptioningTemporal Localization	—Unverified
AV-Reasoner: Improving and Benchmarking Clue-Grounded Audio-Visual Counting for MLLMs	Jun 5, 2025	BenchmarkingVideo Understanding	—Unverified
AVT: Audio-Video Transformer for Multimodal Action Recognition	Sep 22, 2022	Action RecognitionAudio Classification	—Unverified
BATMAN: Bilateral Attention Transformer in Motion-Appearance Neighboring Space for Video Object Segmentation	Aug 1, 2022	ObjectOptical Flow Estimation	—Unverified
BEARCUBS: A benchmark for computer-using web agents	Mar 10, 2025	Video Understanding	—Unverified
BERT for Large-scale Video Segment Classification with Test-time Augmentation	Dec 2, 2019	General ClassificationVideo Understanding	—Unverified
Beyond Appearance: Geometric Cues for Robust Video Instance Segmentation	Jul 8, 2025	Depth EstimationDepth Prediction	—Unverified
Beyond Boxes: Mask-Guided Spatio-Temporal Feature Aggregation for Video Object Detection	Dec 6, 2024	GPUMulti-Object Tracking	—Unverified
Beyond still images: Temporal features and input variance resilience	Nov 1, 2023	Video Understanding	—Unverified
Beyond the Camera: Neural Networks in World Coordinates	Mar 12, 2020	Action RecognitionVideo Stabilization	—Unverified
Beyond Training: Dynamic Token Merging for Zero-Shot Video Understanding	Nov 21, 2024	Computational EfficiencyVideo Understanding	—Unverified
BioVL-QR: Egocentric Biochemical Vision-and-Language Dataset Using Micro QR Codes	Apr 4, 2024	ObjectVideo Understanding	—Unverified
Breaking Down Video LLM Benchmarks: Knowledge, Spatial Perception, or True Temporal Understanding?	May 20, 2025	Video Understanding	—Unverified
Breaking the Encoder Barrier for Seamless Video-Language Understanding	Mar 24, 2025	DecoderLanguage Modeling	—Unverified
Bridging Audio and Vision: Zero-Shot Audiovisual Segmentation by Connecting Pretrained Models	Jun 6, 2025	SegmentationVideo Understanding	—Unverified
Bringing Image Scene Structure to Video via Frame-Clip Consistency of Object Tokens	Jun 13, 2022	Action RecognitionVideo Understanding	—Unverified
Building a Mind Palace: Structuring Environment-Grounded Semantic Graphs for Effective Long Video Analysis with LLMs	Jan 8, 2025	EgoSchemaObject Tracking	—Unverified
Building Scalable Video Understanding Benchmarks through Sports	Jan 17, 2023	Video Understanding	—Unverified

Show:10 25 50

← PrevPage 28 of 46Next →

No leaderboard results yet.