Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 201–225 of 1149 papers

Title	Date	Tasks	Status	Hype	Score
Crossover Learning for Fast Online Video Instance Segmentation	Apr 13, 2021	Instance SegmentationSemantic Segmentation	CodeCode Available	1	5
IntentVizor: Towards Generic Query Guided Interactive Video Summarization	Sep 30, 2021	Video SummarizationVideo Understanding	CodeCode Available	1	5
Helping Hands: An Object-Aware Ego-Centric Video Recognition Model	Aug 15, 2023	DecoderObject	CodeCode Available	1	5
HAT: History-Augmented Anchor Transformer for Online Temporal Action Localization	Aug 12, 2024	Action LocalizationTemporal Action Localization	CodeCode Available	1	5
NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions	Jun 19, 2021	Question AnsweringVideo Question Answering	CodeCode Available	1	5
No Time to Waste: Squeeze Time into Channel for Mobile Video Understanding	May 14, 2024	Action DetectionGPU	CodeCode Available	1	5
Object-Region Video Transformers	Oct 13, 2021	Action DetectionAction Recognition	CodeCode Available	1	5
Contrastive Spatio-Temporal Pretext Learning for Self-supervised Video Representation	Dec 16, 2021	Contrastive LearningRepresentation Learning	CodeCode Available	1	5
Contrastive Masked Autoencoders for Self-Supervised Video Hashing	Nov 21, 2022	DecoderRetrieval	CodeCode Available	1	5
Hier-EgoPack: Hierarchical Egocentric Video Understanding with Diverse Task Perspectives	Feb 4, 2025	Video Understanding	CodeCode Available	1	5
Occluded Video Instance Segmentation: A Benchmark	Feb 2, 2021	Instance SegmentationSegmentation	CodeCode Available	1	5
DEVIAS: Learning Disentangled Video Representations of Action and Scene	Nov 30, 2023	Action RecognitionDecoder	CodeCode Available	1	5
Panoptic Video Scene Graph Generation	Nov 28, 2023	Graph GenerationPanoptic Scene Graph Generation	CodeCode Available	1	5
Differentiable Task Graph Learning: Procedural Activity Representation and Online Mistake Detection from Egocentric Videos	Jun 3, 2024	Mistake DetectionOnline Mistake Detection	CodeCode Available	1	5
From Seconds to Hours: Reviewing MultiModal Large Language Models on Comprehensive Long Video Understanding	Sep 27, 2024	Video UnderstandingVisual Reasoning	CodeCode Available	1	5
Language-Assisted Skeleton Action Understanding for Skeleton-Based Temporal Action Segmentation	Oct 31, 2024	Action SegmentationAction Understanding	CodeCode Available	1	5
AVS-Mamba: Exploring Temporal and Multi-modal Mamba for Audio-Visual Segmentation	Jan 14, 2025	MambaVideo Understanding	CodeCode Available	1	5
Disentangle Your Dense Object Detector	Jul 7, 2021	DisentanglementObject	CodeCode Available	1	5
From Representation to Reasoning: Towards both Evidence and Commonsense Reasoning for Video Question-Answering	May 30, 2022	counterfactualDescriptive	CodeCode Available	1	5
DisTime: Distribution-based Time Representation for Video Large Language Models	May 30, 2025	Temporal LocalizationVideo Understanding	CodeCode Available	1	5
BasicTAD: an Astounding RGB-Only Baseline for Temporal Action Detection	May 5, 2022	Action Detectionobject-detection	CodeCode Available	1	5
Learning Optical Flow with Adaptive Graph Reasoning	Feb 8, 2022	Motion EstimationOptical Flow Estimation	CodeCode Available	1	5
From My View to Yours: Ego-Augmented Learning in Large Vision Language Models for Understanding Exocentric Daily Living Activities	Jan 10, 2025	Human-Object Interaction DetectionKnowledge Distillation	CodeCode Available	1	5
Free Lunch for Surgical Video Understanding by Distilling Self-Supervisions	May 19, 2022	Contrastive LearningSelf-Supervised Learning	CodeCode Available	1	5
Grounded Question-Answering in Long Egocentric Videos	Dec 11, 2023	Video GroundingVideo Question Answering	CodeCode Available	1	5

Show:10 25 50

← PrevPage 9 of 46Next →

No leaderboard results yet.