Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1051–1075 of 1149 papers

Title	Date	Tasks	Status
Temporally smooth online action detection using cycle-consistent future anticipation	Apr 16, 2021	Action DetectionAutonomous Driving	CodeCode Available
HA-ViD: A Human Assembly Video Dataset for Comprehensive Assembly Knowledge Understanding	Jul 9, 2023	Action RecognitionAction Segmentation	CodeCode Available
Beyond Instructional Videos: Probing for More Diverse Visual-Textual Grounding on YouTube	Apr 29, 2020	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	CodeCode Available
Temporal Action Proposal Generation With Action Frequency Adaptive Network	Jun 23, 2023	Knowledge DistillationTemporal Action Proposal Generation	CodeCode Available
A Video Is Worth 4096 Tokens: Verbalize Videos To Understand Them In Zero Shot	May 16, 2023	Emotion ClassificationQuestion Answering	CodeCode Available
Telling Stories for Common Sense Zero-Shot Action Recognition	Sep 29, 2023	Action RecognitionArticles	CodeCode Available
Technical Report for CVPR 2022 LOVEU AQTC Challenge	Jun 29, 2022	Video Understanding	CodeCode Available
Tiny Video Networks	Oct 15, 2019	CPUGPU	CodeCode Available
Teacher Agent: A Knowledge Distillation-Free Framework for Rehearsal-based Video Incremental Learning	Jun 1, 2023	Incremental LearningKnowledge Distillation	CodeCode Available
Task-Aware KV Compression For Cost-Effective Long Video Understanding	Jun 26, 2025	Video Understanding	CodeCode Available
TAda! Temporally-Adaptive Convolutions for Video Understanding	Oct 12, 2021	Action ClassificationAction Recognition	CodeCode Available
Suppressing Static Visual Cues via Normalizing Flows for Self-Supervised Video Representation Learning	Dec 7, 2021	Contrastive LearningRepresentation Learning	CodeCode Available
Submission to Generic Event Boundary Detection Challenge@CVPR 2022: Local Context Modeling and Global Boundary Decoding Approach	Jun 30, 2022	Boundary DetectionGeneric Event Boundary Detection	CodeCode Available
Streaming Detection of Queried Event Start	Dec 4, 2024	Autonomous Drivingparameter-efficient fine-tuning	CodeCode Available
Hallucination Mitigation Prompts Long-term Video Understanding	Jun 17, 2024	Answer GenerationHallucination	CodeCode Available
Gaussian Temporal Awareness Networks for Action Localization	Sep 9, 2019	Action Localizationobject-detection	CodeCode Available
FriendsQA: A New Large-Scale Deep Video Understanding Dataset with Fine-grained Topic Categorization for Story Videos	Dec 22, 2024	Language ModellingLarge Language Model	CodeCode Available
Video action detection by learning graph-based spatio-temporal interactions	Dec 9, 2019	Action DetectionAction Localization	CodeCode Available
FitCLIP: Refining Large-Scale Pretrained Image-Text Models for Zero-Shot Video Understanding Tasks	Mar 24, 2022	Action RecognitionRetrieval	CodeCode Available
Spatio-Temporal Perturbations for Video Attribution	Sep 1, 2021	Video Understanding	CodeCode Available
FIBER: Fill-in-the-Blanks as a Challenging Video Understanding Evaluation Framework	Apr 9, 2021	Language ModellingMultiple-choice	CodeCode Available
SoccerNet 2024 Challenges Results	Sep 16, 2024	Action SpottingDense Video Captioning	CodeCode Available
Few-Shot Referring Relationships in Videos	Jan 1, 2023	ObjectRelation Network	CodeCode Available
Towards Multimodal Video Paragraph Captioning Models Robust to Missing Modality	Mar 28, 2024	Data AugmentationDiversity	CodeCode Available
SoccerChat: Integrating Multimodal Data for Enhanced Soccer Game Understanding	May 22, 2025	Action ClassificationAutomatic Speech Recognition	CodeCode Available

Show:10 25 50

← PrevPage 43 of 46Next →

No leaderboard results yet.