Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 401–450 of 1149 papers

Title	Date	Tasks	Status	Score
Bootstrap Your Own Views: Masked Ego-Exo Modeling for Fine-grained View-invariant Video Representations	Mar 25, 2025	Representation LearningVideo Understanding	CodeCode Available	5
ECO: Efficient Convolutional Network for Online Video Understanding	Apr 24, 2018	Action ClassificationAction Recognition	CodeCode Available	5
(Un)likelihood Training for Interpretable Embedding	Jul 1, 2022	Ad-hoc video searchDecoder	CodeCode Available	5
Unsupervised Adversarial Visual Level Domain Adaptation for Learning Video Object Detectors from Images	Oct 4, 2018	Domain AdaptationImage-to-Image Translation	CodeCode Available	5
UAL-Bench: The First Comprehensive Unusual Activity Localization Benchmark	Oct 2, 2024	Unusual Activity LocalizationVideo Understanding	CodeCode Available	5
ACVUBench: Audio-Centric Video Understanding Benchmark	Mar 25, 2025	Video Understanding	CodeCode Available	5
TUNA: Comprehensive Fine-grained Temporal Understanding Evaluation on Dense Dynamic Videos	May 26, 2025	AttributeVideo Understanding	CodeCode Available	5
TS-LSTM and Temporal-Inception: Exploiting Spatiotemporal Dynamics for Activity Recognition	Mar 30, 2017	Action ClassificationAction Recognition	CodeCode Available	5
Beyond Raw Videos: Understanding Edited Videos with Large Multimodal Model	Jun 15, 2024	Question AnsweringVideo Understanding	CodeCode Available	5
Beyond Instructional Videos: Probing for More Diverse Visual-Textual Grounding on YouTube	Apr 29, 2020	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	CodeCode Available	5
DriftNet: Aggressive Driving Behavior Classification using 3D EfficientNet Architecture	Apr 18, 2020	Anomaly DetectionClassification	CodeCode Available	5
DramaQA: Character-Centered Video Story Understanding with Hierarchical QA	May 7, 2020	Question AnsweringVideo Question Answering	CodeCode Available	5
Dr^2Net: Dynamic Reversible Dual-Residual Networks for Memory-Efficient Finetuning	Jan 8, 2024	object-detectionObject Detection	CodeCode Available	5
Towards Multimodal Video Paragraph Captioning Models Robust to Missing Modality	Mar 28, 2024	Data AugmentationDiversity	CodeCode Available	5
Don't Judge by the Look: Towards Motion Coherent Video Representation	Mar 14, 2024	Data AugmentationObject Recognition	CodeCode Available	5
Tiny Video Networks	Oct 15, 2019	CPUGPU	CodeCode Available	5
The YouTube-8M Kaggle Competition: Challenges and Methods	Jun 28, 2017	General ClassificationVideo Classification	CodeCode Available	5
The Visual Centrifuge: Model-Free Layered Video Representations	Dec 4, 2018	Color Constancymodel	CodeCode Available	5
Temporal Tessellation: A Unified Approach for Video Analysis	Dec 21, 2016	Action DetectionVideo Captioning	CodeCode Available	5
The Monkeytyping Solution to the YouTube-8M Video Understanding Challenge	Jun 16, 2017	General ClassificationVideo Classification	CodeCode Available	5
In My Perspective, In My Hands: Accurate Egocentric 2D Hand Pose and Action Recognition	Apr 14, 2024	Action RecognitionHand Pose Estimation	CodeCode Available	5
Temporal Modeling Approaches for Large-scale Youtube-8M Video Understanding	Jul 14, 2017	Video RecognitionVideo Understanding	CodeCode Available	5
Temporal-Oriented Recipe for Transferring Large Vision-Language Model to Video Understanding	May 19, 2025	Language ModelingLanguage Modelling	CodeCode Available	5
Temporally smooth online action detection using cycle-consistent future anticipation	Apr 16, 2021	Action DetectionAutonomous Driving	CodeCode Available	5
Temporal Action Proposal Generation With Action Frequency Adaptive Network	Jun 23, 2023	Knowledge DistillationTemporal Action Proposal Generation	CodeCode Available	5
A Video Is Worth 4096 Tokens: Verbalize Videos To Understand Them In Zero Shot	May 16, 2023	Emotion ClassificationQuestion Answering	CodeCode Available	5
Diagnosing Error in Temporal Action Detectors	Jul 27, 2018	Action LocalizationDiagnostic	CodeCode Available	5
Telling Stories for Common Sense Zero-Shot Action Recognition	Sep 29, 2023	Action RecognitionArticles	CodeCode Available	5
ICSVR: Investigating Compositional and Syntactic Understanding in Video Retrieval Models	Jun 28, 2023	RetrievalVideo Retrieval	CodeCode Available	5
Technical Report for CVPR 2022 LOVEU AQTC Challenge	Jun 29, 2022	Video Understanding	CodeCode Available	5
4D Generic Video Object Proposals	Jan 26, 2019	Instance SegmentationObject	CodeCode Available	5
Detection-Fusion for Knowledge Graph Extraction from Videos	Dec 30, 2024	Knowledge GraphsLanguage Modeling	CodeCode Available	5
https://arxiv.org/abs/2407.00634	Jul 2, 2024	Video CaptioningVideo Description	CodeCode Available	5
Teacher Agent: A Knowledge Distillation-Free Framework for Rehearsal-based Video Incremental Learning	Jun 1, 2023	Incremental LearningKnowledge Distillation	CodeCode Available	5
How Would The Viewer Feel? Estimating Wellbeing From Video Scenarios	Oct 18, 2022	Video Understanding	CodeCode Available	5
Detect-and-Track: Efficient Pose Estimation in Videos	Dec 26, 2017	Human DetectionKeypoint Estimation	CodeCode Available	5
Task-Aware KV Compression For Cost-Effective Long Video Understanding	Jun 26, 2025	Video Understanding	CodeCode Available	5
HopaDIFF: Holistic-Partial Aware Fourier Conditioned Diffusion for Referring Human Action Segmentation in Multi-Person Scenarios	Jun 11, 2025	Action RecognitionAction Segmentation	CodeCode Available	5
Submission to Generic Event Boundary Detection Challenge@CVPR 2022: Local Context Modeling and Global Boundary Decoding Approach	Jun 30, 2022	Boundary DetectionGeneric Event Boundary Detection	CodeCode Available	5
HLV-1K: A Large-scale Hour-Long Video Benchmark for Time-Specific Long Video Understanding	Jan 3, 2025	Question AnsweringVideo Understanding	CodeCode Available	5
Suppressing Static Visual Cues via Normalizing Flows for Self-Supervised Video Representation Learning	Dec 7, 2021	Contrastive LearningRepresentation Learning	CodeCode Available	5
Deep Learning Methods for Efficient Large Scale Video Labeling	Jun 14, 2017	Deep LearningVideo Understanding	CodeCode Available	5
Hierarchical Deep Recurrent Architecture for Video Understanding	Jul 11, 2017	ClassificationGeneral Classification	CodeCode Available	5
Streaming Detection of Queried Event Start	Dec 4, 2024	Autonomous Drivingparameter-efficient fine-tuning	CodeCode Available	5
Video action detection by learning graph-based spatio-temporal interactions	Dec 9, 2019	Action DetectionAction Localization	CodeCode Available	5
HA-ViD: A Human Assembly Video Dataset for Comprehensive Assembly Knowledge Understanding	Jul 9, 2023	Action RecognitionAction Segmentation	CodeCode Available	5
Spatio-Temporal Perturbations for Video Attribution	Sep 1, 2021	Video Understanding	CodeCode Available	5
Hallucination Mitigation Prompts Long-term Video Understanding	Jun 17, 2024	Answer GenerationHallucination	CodeCode Available	5
SoccerNet 2024 Challenges Results	Sep 16, 2024	Action SpottingDense Video Captioning	CodeCode Available	5
Snippet-Aware Transformer With Multiple Action Elements for Skeleton-Based Action Segmentation	May 6, 2024	Action SegmentationSkeleton Based Action Segmentation	CodeCode Available	5

Show:10 25 50

← PrevPage 9 of 23Next →

No leaderboard results yet.