Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1051–1100 of 1149 papers

Title	Date	Tasks	Status
Temporally smooth online action detection using cycle-consistent future anticipation	Apr 16, 2021	Action DetectionAutonomous Driving	CodeCode Available
HA-ViD: A Human Assembly Video Dataset for Comprehensive Assembly Knowledge Understanding	Jul 9, 2023	Action RecognitionAction Segmentation	CodeCode Available
Beyond Instructional Videos: Probing for More Diverse Visual-Textual Grounding on YouTube	Apr 29, 2020	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	CodeCode Available
Temporal Action Proposal Generation With Action Frequency Adaptive Network	Jun 23, 2023	Knowledge DistillationTemporal Action Proposal Generation	CodeCode Available
A Video Is Worth 4096 Tokens: Verbalize Videos To Understand Them In Zero Shot	May 16, 2023	Emotion ClassificationQuestion Answering	CodeCode Available
Telling Stories for Common Sense Zero-Shot Action Recognition	Sep 29, 2023	Action RecognitionArticles	CodeCode Available
Technical Report for CVPR 2022 LOVEU AQTC Challenge	Jun 29, 2022	Video Understanding	CodeCode Available
Tiny Video Networks	Oct 15, 2019	CPUGPU	CodeCode Available
Teacher Agent: A Knowledge Distillation-Free Framework for Rehearsal-based Video Incremental Learning	Jun 1, 2023	Incremental LearningKnowledge Distillation	CodeCode Available
Task-Aware KV Compression For Cost-Effective Long Video Understanding	Jun 26, 2025	Video Understanding	CodeCode Available
TAda! Temporally-Adaptive Convolutions for Video Understanding	Oct 12, 2021	Action ClassificationAction Recognition	CodeCode Available
Suppressing Static Visual Cues via Normalizing Flows for Self-Supervised Video Representation Learning	Dec 7, 2021	Contrastive LearningRepresentation Learning	CodeCode Available
Submission to Generic Event Boundary Detection Challenge@CVPR 2022: Local Context Modeling and Global Boundary Decoding Approach	Jun 30, 2022	Boundary DetectionGeneric Event Boundary Detection	CodeCode Available
Streaming Detection of Queried Event Start	Dec 4, 2024	Autonomous Drivingparameter-efficient fine-tuning	CodeCode Available
Hallucination Mitigation Prompts Long-term Video Understanding	Jun 17, 2024	Answer GenerationHallucination	CodeCode Available
Gaussian Temporal Awareness Networks for Action Localization	Sep 9, 2019	Action Localizationobject-detection	CodeCode Available
FriendsQA: A New Large-Scale Deep Video Understanding Dataset with Fine-grained Topic Categorization for Story Videos	Dec 22, 2024	Language ModellingLarge Language Model	CodeCode Available
Video action detection by learning graph-based spatio-temporal interactions	Dec 9, 2019	Action DetectionAction Localization	CodeCode Available
FitCLIP: Refining Large-Scale Pretrained Image-Text Models for Zero-Shot Video Understanding Tasks	Mar 24, 2022	Action RecognitionRetrieval	CodeCode Available
Spatio-Temporal Perturbations for Video Attribution	Sep 1, 2021	Video Understanding	CodeCode Available
FIBER: Fill-in-the-Blanks as a Challenging Video Understanding Evaluation Framework	Apr 9, 2021	Language ModellingMultiple-choice	CodeCode Available
SoccerNet 2024 Challenges Results	Sep 16, 2024	Action SpottingDense Video Captioning	CodeCode Available
Few-Shot Referring Relationships in Videos	Jan 1, 2023	ObjectRelation Network	CodeCode Available
Towards Multimodal Video Paragraph Captioning Models Robust to Missing Modality	Mar 28, 2024	Data AugmentationDiversity	CodeCode Available
SoccerChat: Integrating Multimodal Data for Enhanced Soccer Game Understanding	May 22, 2025	Action ClassificationAutomatic Speech Recognition	CodeCode Available
4D-Bench: Benchmarking Multi-modal Large Language Models for 4D Object Understanding	Mar 22, 2025	BenchmarkingObject	CodeCode Available
Snippet-Aware Transformer With Multiple Action Elements for Skeleton-Based Action Segmentation	May 6, 2024	Action SegmentationSkeleton Based Action Segmentation	CodeCode Available
Features Understanding in 3D CNNs for Actions Recognition in Video	Oct 1, 2020	Action RecognitionDecision Making	CodeCode Available
Situational Scene Graph for Structured Human-centric Situation Understanding	Oct 30, 2024	Graph GenerationPredicate Classification	CodeCode Available
Exploring Temporal Information for Improved Video Understanding	May 25, 2019	Action RecognitionOptical Flow Estimation	CodeCode Available
SeriesBench: A Benchmark for Narrative-Driven Drama Series Understanding	Apr 30, 2025	Video Understanding	CodeCode Available
ScVLM: Enhancing Vision-Language Model for Safety-Critical Event Understanding	Oct 1, 2024	Contrastive LearningHallucination	CodeCode Available
Exploiting Long-Term Dependencies for Generating Dynamic Scene Graphs	Dec 18, 2021	Graph GenerationObject	CodeCode Available
Screencast Tutorial Video Understanding	Jun 1, 2020	object-detectionObject Detection	CodeCode Available
Video Object Segmentation using Supervoxel-Based Gerrymandering	Apr 18, 2017	ObjectSemantic Segmentation	CodeCode Available
ScaleLong: A Multi-Timescale Benchmark for Long Video Understanding	May 29, 2025	AvgVideo Understanding	CodeCode Available
Representation Flow for Action Recognition	Oct 2, 2018	Action ClassificationAction Recognition	CodeCode Available
TS-LSTM and Temporal-Inception: Exploiting Spatiotemporal Dynamics for Activity Recognition	Mar 30, 2017	Action ClassificationAction Recognition	CodeCode Available
Relation-aware Hierarchical Attention Framework for Video Question Answering	May 13, 2021	Question AnsweringRelation	CodeCode Available
Re-ID-AR: Improved Person Re-identification in Video via Joint Weakly Supervised Action Recognition	Nov 1, 2021	Action RecognitionPerson Re-Identification	CodeCode Available
Recurrent Space-time Graph Neural Networks	Apr 11, 2019	Action RecognitionHuman-Object Interaction Detection	CodeCode Available
TUNA: Comprehensive Fine-grained Temporal Understanding Evaluation on Dense Dynamic Videos	May 26, 2025	AttributeVideo Understanding	CodeCode Available
ACVUBench: Audio-Centric Video Understanding Benchmark	Mar 25, 2025	Video Understanding	CodeCode Available
AssembleNet: Searching for Multi-Stream Neural Connectivity in Video Architectures	May 30, 2019	Action ClassificationAction Recognition	CodeCode Available
Win-Fail Action Recognition	Feb 15, 2021	Action RecognitionAction Understanding	CodeCode Available
VideoQA in the Era of LLMs: An Empirical Study	Aug 8, 2024	Multimodal Large Language ModelVideo Question Answering	CodeCode Available
UAL-Bench: The First Comprehensive Unusual Activity Localization Benchmark	Oct 2, 2024	Unusual Activity LocalizationVideo Understanding	CodeCode Available
ActAlign: Zero-Shot Fine-Grained Video Classification via Language-Guided Sequence Alignment	Jun 28, 2025	Dynamic Time WarpingLarge Language Model	CodeCode Available
EVA02-AT: Egocentric Video-Language Understanding with Spatial-Temporal Rotary Positional Embeddings and Symmetric Optimization	Jun 17, 2025	Multi-Instance RetrievalRetrieval	CodeCode Available
Enhancing Temporal Modeling of Video LLMs via Time Gating	Oct 8, 2024	MVBenchQuestion Answering	CodeCode Available

Show:10 25 50

← PrevPage 22 of 23Next →

No leaderboard results yet.