SOTAVerified|Agents Browse Leaderboard About

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 576–600 of 1149 papers

Title	Date	Tasks	Status
Empowering LLMs with Pseudo-Untrimmed Videos for Audio-Visual Temporal Understanding	Mar 24, 2024	Dense Video CaptioningTemporal Localization	—Unverified
AV-Reasoner: Improving and Benchmarking Clue-Grounded Audio-Visual Counting for MLLMs	Jun 5, 2025	BenchmarkingVideo Understanding	—Unverified
AVT: Audio-Video Transformer for Multimodal Action Recognition	Sep 22, 2022	Action RecognitionAudio Classification	—Unverified
BATMAN: Bilateral Attention Transformer in Motion-Appearance Neighboring Space for Video Object Segmentation	Aug 1, 2022	ObjectOptical Flow Estimation	—Unverified
BEARCUBS: A benchmark for computer-using web agents	Mar 10, 2025	Video Understanding	—Unverified
BERT for Large-scale Video Segment Classification with Test-time Augmentation	Dec 2, 2019	General ClassificationVideo Understanding	—Unverified
Beyond Appearance: Geometric Cues for Robust Video Instance Segmentation	Jul 8, 2025	Depth EstimationDepth Prediction	—Unverified
Beyond Boxes: Mask-Guided Spatio-Temporal Feature Aggregation for Video Object Detection	Dec 6, 2024	GPUMulti-Object Tracking	—Unverified
Beyond still images: Temporal features and input variance resilience	Nov 1, 2023	Video Understanding	—Unverified
Beyond the Camera: Neural Networks in World Coordinates	Mar 12, 2020	Action RecognitionVideo Stabilization	—Unverified
Beyond Training: Dynamic Token Merging for Zero-Shot Video Understanding	Nov 21, 2024	Computational EfficiencyVideo Understanding	—Unverified
BioVL-QR: Egocentric Biochemical Vision-and-Language Dataset Using Micro QR Codes	Apr 4, 2024	ObjectVideo Understanding	—Unverified
Breaking Down Video LLM Benchmarks: Knowledge, Spatial Perception, or True Temporal Understanding?	May 20, 2025	Video Understanding	—Unverified
Breaking the Encoder Barrier for Seamless Video-Language Understanding	Mar 24, 2025	DecoderLanguage Modeling	—Unverified
Bridging Audio and Vision: Zero-Shot Audiovisual Segmentation by Connecting Pretrained Models	Jun 6, 2025	SegmentationVideo Understanding	—Unverified
Bringing Image Scene Structure to Video via Frame-Clip Consistency of Object Tokens	Jun 13, 2022	Action RecognitionVideo Understanding	—Unverified
Building a Mind Palace: Structuring Environment-Grounded Semantic Graphs for Effective Long Video Analysis with LLMs	Jan 8, 2025	EgoSchemaObject Tracking	—Unverified
Building Scalable Video Understanding Benchmarks through Sports	Jan 17, 2023	Video Understanding	—Unverified
C^3: Compositional Counterfactual Contrastive Learning for Video-grounded Dialogues	Jun 16, 2021	Contrastive Learningcounterfactual	—Unverified
CA^2ST: Cross-Attention in Audio, Space, and Time for Holistic Video Recognition	Mar 30, 2025	Action ClassificationAction Recognition	—Unverified
CAG-QIL: Context-Aware Actionness Grouping via Q Imitation Learning for Online Temporal Action Localization	Jan 1, 2021	Action LocalizationImitation Learning	—Unverified
Camera Calibration and Player Localization in SoccerNet-v2 and Investigation of their Representations for Action Spotting	Apr 19, 2021	Action SpottingCamera Calibration	—Unverified
Can CLIP Count Stars? An Empirical Study on Quantity Bias in CLIP	Sep 23, 2024	Image GenerationQuestion Answering	—Unverified
FIOVA: A Multi-Annotator Benchmark for Human-Aligned Video Captioning	Oct 20, 2024	DiagnosticVideo Captioning	—Unverified
Can MLLMs Guide Weakly-Supervised Temporal Action Localization Tasks?	Nov 13, 2024	Action LocalizationTemporal Action Localization	—Unverified

Show:10 25 50

← PrevPage 24 of 46Next →

No leaderboard results yet.