Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 326–350 of 1149 papers

Title	Date	Tasks	Status	Hype
TokenLearner: Adaptive Space-Time Tokenization for Videos	Dec 1, 2021	Representation LearningVideo Recognition	CodeCode Available	1
End-to-End Referring Video Object Segmentation with Multimodal Transformers	Nov 29, 2021	Inductive BiasInstance Segmentation	CodeCode Available	1
SwinBERT: End-to-End Transformers with Sparse Attention for Video Captioning	Nov 25, 2021	Caption GenerationQuestion Answering	CodeCode Available	1
MM-Pyramid: Multimodal Pyramid Attentional Network for Audio-Visual Event Localization and Video Parsing	Nov 24, 2021	audio-visual event localizationVideo Understanding	CodeCode Available	1
VIOLET : End-to-End Video-Language Transformers with Masked Visual-token Modeling	Nov 24, 2021	Question AnsweringRetrieval	CodeCode Available	1
Revisiting spatio-temporal layouts for compositional action recognition	Nov 2, 2021	Action ClassificationAction Detection	CodeCode Available	1
Relational Self-Attention: What's Missing in Attention for Video Understanding	Nov 2, 2021	Action RecognitionTemporal Action Localization	CodeCode Available	1
Benchmarking the Robustness of Spatial-Temporal Models Against Corruptions	Oct 13, 2021	BenchmarkingComputational Efficiency	CodeCode Available	1
Object-Region Video Transformers	Oct 13, 2021	Action DetectionAction Recognition	CodeCode Available	1
Learning Temporally Causal Latent Processes from General Temporal Data	Oct 11, 2021	Causal DiscoveryRepresentation Learning	CodeCode Available	1
IntentVizor: Towards Generic Query Guided Interactive Video Summarization	Sep 30, 2021	Video SummarizationVideo Understanding	CodeCode Available	1
Learning Temporally Latent Causal Processes from General Temporal Data	Sep 29, 2021	Causal DiscoveryDisentanglement	CodeCode Available	1
Towards High-Quality Temporal Action Detection with Sparse Proposals	Sep 18, 2021	Action DetectionAvg	CodeCode Available	1
Foreground-Action Consistency Network for Weakly Supervised Temporal Action Localization	Aug 14, 2021	Action LocalizationMultiple Instance Learning	CodeCode Available	1
AutoVideo: An Automated Video Action Recognition System	Aug 9, 2021	Action RecognitionAutoML	CodeCode Available	1
Token Shift Transformer for Video Classification	Aug 5, 2021	ClassificationComputational Efficiency	CodeCode Available	1
Elaborative Rehearsal for Zero-shot Action Recognition	Aug 5, 2021	Action RecognitionFew-Shot Learning	CodeCode Available	1
Enhancing Self-supervised Video Representation Learning via Multi-level Feature Optimization	Aug 4, 2021	Contrastive LearningRepresentation Learning	CodeCode Available	1
Spatial-Temporal Transformer for Dynamic Scene Graph Generation	Jul 26, 2021	DecoderScene Graph Generation	CodeCode Available	1
Disentangle Your Dense Object Detector	Jul 7, 2021	DisentanglementObject	CodeCode Available	1
Feature Combination Meets Attention: Baidu Soccer Embeddings and Transformer based Temporal Detection	Jun 28, 2021	Action RecognitionAction Spotting	CodeCode Available	1
Can An Image Classifier Suffice For Action Recognition?	Jun 26, 2021	Action Recognitionimage-classification	CodeCode Available	1
Towards Long-Form Video Understanding	Jun 21, 2021	Action RecognitionForm	CodeCode Available	1
TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?	Jun 21, 2021	Action ClassificationImage Classification	CodeCode Available	1
VIMPAC: Video Pre-Training via Masked Token Prediction and Contrastive Learning	Jun 21, 2021	Action ClassificationAction Recognition	CodeCode Available	1

Show:10 25 50

← PrevPage 14 of 46Next →

No leaderboard results yet.