SOTAVerified|Agents Browse Leaderboard About

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 476–500 of 1149 papers

Title	Date	Tasks	Status
InstructionBench: An Instructional Video Understanding Benchmark	Apr 7, 2025	Common Sense ReasoningMultiple-choice	—Unverified
Distantly Supervised Semantic Text Detection and Recognition for Broadcast Sports Videos Understanding	Oct 31, 2021	Action RecognitionText Detection	—Unverified
AVT: Audio-Video Transformer for Multimodal Action Recognition	Sep 22, 2022	Action RecognitionAudio Classification	—Unverified
Aligned Better, Listen Better for Audio-Visual Large Language Models	Apr 2, 2025	Video Understanding	—Unverified
Disentangle and denoise: Tackling context misalignment for video moment retrieval	Aug 14, 2024	DenoisingDisentanglement	—Unverified
InfiniPot-V: Memory-Constrained KV Cache Compression for Streaming Video Understanding	Jun 18, 2025	GPUStreaming video understanding	—Unverified
AV-Reasoner: Improving and Benchmarking Clue-Grounded Audio-Visual Counting for MLLMs	Jun 5, 2025	BenchmarkingVideo Understanding	—Unverified
Large-Scale Video Classification with Feature Space Augmentation coupled with Learned Label Relations and Ensembling	Sep 21, 2018	General ClassificationVideo Classification	—Unverified
Large Scale Video Representation Learning via Relational Graph Clustering	Jun 1, 2020	ClusteringGraph Clustering	—Unverified
Large-Scale YouTube-8M Video Understanding with Deep Neural Networks	Jun 14, 2017	ClassificationGeneral Classification	—Unverified
LASER: A Neuro-Symbolic Framework for Learning Spatial-Temporal Scene Graphs with Weak Supervision	Apr 15, 2023	Language ModelingLanguage Modelling	—Unverified
Dynamic Appearance: A Video Representation for Action Recognition with Joint Training	Nov 23, 2022	Action RecognitionTemporal Action Localization	—Unverified
Beyond the Camera: Neural Networks in World Coordinates	Mar 12, 2020	Action RecognitionVideo Stabilization	—Unverified
Dynamic Graph Modules for Modeling Object-Object Interactions in Activity Recognition	Dec 13, 2018	3D Action RecognitionAction Recognition	—Unverified
Learning an Augmented RGB Representation with Cross-Modal Knowledge Distillation for Action Detection	Aug 8, 2021	Action DetectionKnowledge Distillation	—Unverified
Learning Audio-guided Video Representation with Gated Attention for Video-Text Retrieval	Apr 3, 2025	Information RetrievalRepresentation Learning	—Unverified
Learning Dynamic MRI Reconstruction with Convolutional Network Assisted Reconstruction Swin Transformer	Sep 19, 2023	AnatomyComputational Efficiency	—Unverified
Learning Dynamics via Graph Neural Networks for Human Pose Estimation and Tracking	Jun 7, 2021	Graph Neural NetworkMulti-Person Pose Estimation	—Unverified
Learning Fine-grained View-Invariant Representations from Unpaired Ego-Exo Videos via Temporal Alignment	Jun 8, 2023	Video Understanding	—Unverified
Learning from Multiple Sources for Video Summarisation	Jan 13, 2015	ClusteringVideo Understanding	—Unverified
Learning Higher-order Object Interactions for Keypoint-based Video Understanding	May 16, 2023	Action LocalizationAction Recognition	—Unverified
Inductive Attention for Video Action Anticipation	Dec 17, 2022	Action AnticipationAction Recognition	—Unverified
Discrete neural representations for explainable anomaly detection	Dec 10, 2021	Anomaly DetectionObject	—Unverified
Improving Video Model Transfer With Dynamic Representation Learning	Jan 1, 2022	Action ClassificationKnowledge Distillation	—Unverified
Improving LLM Video Understanding with 16 Frames Per Second	Mar 18, 2025	MMEVideo MME	—Unverified

Show:10 25 50

← PrevPage 20 of 46Next →

No leaderboard results yet.