SOTAVerified|Agents Browse Leaderboard About

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 401–425 of 1149 papers

Title	Date	Tasks	Status
Efficient Annotation and Learning for 3D Hand Pose Estimation: A Survey	Jun 5, 2022	3D Hand Pose EstimationDomain Adaptation	—Unverified
An Effective Way to Improve YouTube-8M Classification Accuracy in Google Cloud Platform	Jun 26, 2017	ClassificationDeep Learning	—Unverified
AdaCM^2: On Understanding Extremely Long-Term Video with Adaptive Cross-Modality Memory Reduction	Nov 19, 2024	GPUQuestion Answering	—Unverified
Label Denoising with Large Ensembles of Heterogeneous Neural Networks	Sep 12, 2018	Data AugmentationDenoising	—Unverified
EAGLE: Egocentric AGgregated Language-video Engine	Sep 26, 2024	Action RecognitionActivity Recognition	—Unverified
DynTok: Dynamic Compression of Visual Tokens for Efficient and Effective Video Understanding	Jun 4, 2025	MMEVideo MME	—Unverified
BioVL-QR: Egocentric Biochemical Vision-and-Language Dataset Using Micro QR Codes	Apr 4, 2024	ObjectVideo Understanding	—Unverified
An Attempt towards Interpretable Audio-Visual Video Captioning	Dec 7, 2018	Audio captioningAudio-Visual Video Captioning	—Unverified
DynFocus: Dynamic Cooperative Network Empowers LLMs with Video Understanding	Nov 19, 2024	Question AnsweringVideo Understanding	—Unverified
Dynamic Multistep Reasoning based on Video Scene Graph for Video Question Answering	Jul 1, 2022	Question AnsweringVideo Question Answering	—Unverified
Beyond Training: Dynamic Token Merging for Zero-Shot Video Understanding	Nov 21, 2024	Computational EfficiencyVideo Understanding	—Unverified
Dynamic Graph Modules for Modeling Object-Object Interactions in Activity Recognition	Dec 13, 2018	3D Action RecognitionAction Recognition	—Unverified
Dynamic Appearance: A Video Representation for Action Recognition with Joint Training	Nov 23, 2022	Action RecognitionTemporal Action Localization	—Unverified
Beyond the Camera: Neural Networks in World Coordinates	Mar 12, 2020	Action RecognitionVideo Stabilization	—Unverified
Analyzing Zero-Shot Abilities of Vision-Language Models on Video Understanding Tasks	Oct 7, 2023	Action RecognitionMultiple-choice	—Unverified
DyMU: Dynamic Merging and Virtual Unmerging for Efficient VLMs	Apr 23, 2025	Token ReductionVideo Understanding	—Unverified
DualX-VSR: Dual Axial SpatialTemporal Transformer for Real-World Video Super-Resolution without Motion Compensation	Jun 5, 2025	Motion CompensationOptical Flow Estimation	—Unverified
Beyond still images: Temporal features and input variance resilience	Nov 1, 2023	Video Understanding	—Unverified
DTVLT: A Multi-modal Diverse Text Benchmark for Visual Language Tracking Based on LLM	Oct 3, 2024	Object TrackingVideo Understanding	—Unverified
Abductive Ego-View Accident Video Understanding for Safe Driving Perception	Mar 1, 2024	Objectobject-detection	—Unverified
An Analysis of Data Transformation Effects on Segment Anything 2	Feb 25, 2025	Semantic SegmentationVideo Object Segmentation	—Unverified
Language as the Medium: Multimodal Video Classification through text only	Sep 19, 2023	Action RecognitionVideo Classification	—Unverified
Dilated Temporal Relational Adversarial Network for Generic Video Summarization	Apr 30, 2018	Generative Adversarial NetworkVideo Summarization	—Unverified
DrVideo: Document Retrieval Based Long Video Understanding	Jun 18, 2024	document understandingEgoSchema	—Unverified
DriveGPT4: Interpretable End-to-end Autonomous Driving via Large Language Model	Oct 2, 2023	Autonomous DrivingLanguage Modeling	—Unverified

Show:10 25 50

← PrevPage 17 of 46Next →

No leaderboard results yet.