Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 301–350 of 1149 papers

Title	Date	Tasks	Status	Hype
Spotting Temporally Precise, Fine-Grained Events in Video	Jul 20, 2022	Action DetectionAction Spotting	CodeCode Available	1
Clover: Towards A Unified Video-Language Alignment and Fusion Model	Jul 16, 2022	Language ModelingLanguage Modelling	CodeCode Available	1
Is Appearance Free Action Recognition Possible?	Jul 13, 2022	Action RecognitionOptical Flow Estimation	CodeCode Available	1
Federated Self-supervised Learning for Video Understanding	Jul 5, 2022	Action RecognitionFederated Learning	CodeCode Available	1
ST-Adapter: Parameter-Efficient Image-to-Video Transfer Learning	Jun 27, 2022	Action ClassificationAction Recognition	CodeCode Available	1
REVECA -- Rich Encoder-decoder framework for Video Event CAptioner	Jun 18, 2022	DecoderSemantic Segmentation	CodeCode Available	1
Stand-Alone Inter-Frame Attention in Video Models	Jun 14, 2022	Action ClassificationAction Recognition	CodeCode Available	1
A Simple and Efficient Pipeline to Build an End-to-End Spatial-Temporal Action Detector	Jun 7, 2022	Action ClassificationAction Detection	CodeCode Available	1
From Representation to Reasoning: Towards both Evidence and Commonsense Reasoning for Video Question-Answering	May 30, 2022	counterfactualDescriptive	CodeCode Available	1
Free Lunch for Surgical Video Understanding by Distilling Self-Supervisions	May 19, 2022	Contrastive LearningSelf-Supervised Learning	CodeCode Available	1
ETAD: Training Action Detection End to End on a Laptop	May 14, 2022	Action DetectionGPU	CodeCode Available	1
BasicTAD: an Astounding RGB-Only Baseline for Temporal Action Detection	May 5, 2022	Action Detectionobject-detection	CodeCode Available	1
A Multi-Person Video Dataset Annotation Method of Spatio-Temporally Actions	Apr 21, 2022	Action DetectionVideo Understanding	CodeCode Available	1
Temporal Alignment Networks for Long-term Video	Apr 6, 2022	Action RecognitionAction Segmentation	CodeCode Available	1
An Empirical Study of End-to-End Temporal Action Detection	Apr 6, 2022	Action ClassificationAction Detection	CodeCode Available	1
Long Movie Clip Classification with State-Space Video Models	Apr 4, 2022	ClassificationDecoder	CodeCode Available	1
SPAct: Self-supervised Privacy Preservation for Action Recognition	Mar 29, 2022	Action ClassificationAction Recognition	CodeCode Available	1
How Severe is Benchmark-Sensitivity in Video Self-Supervised Learning?	Mar 27, 2022	Self-Supervised LearningSensitivity	CodeCode Available	1
Domain Knowledge-Informed Self-Supervised Representations for Workout Form Assessment	Feb 28, 2022	3D Action RecognitionAction Analysis	CodeCode Available	1
Learning Optical Flow with Adaptive Graph Reasoning	Feb 8, 2022	Motion EstimationOptical Flow Estimation	CodeCode Available	1
A Dataset for Medical Instructional Video Classification and Question Answering	Jan 30, 2022	ClassificationQuestion Answering	CodeCode Available	1
Video Joint Modelling Based on Hierarchical Transformer for Co-summarization	Dec 27, 2021	RetrievalSupervised Video Summarization	CodeCode Available	1
Contrastive Spatio-Temporal Pretext Learning for Self-supervised Video Representation	Dec 16, 2021	Contrastive LearningRepresentation Learning	CodeCode Available	1
Progressive Attention on Multi-Level Dense Difference Maps for Generic Event Boundary Detection	Dec 9, 2021	Boundary DetectionDiversity	CodeCode Available	1
Prompting Visual-Language Models for Efficient Video Understanding	Dec 8, 2021	Action RecognitionLanguage Modelling	CodeCode Available	1
TokenLearner: Adaptive Space-Time Tokenization for Videos	Dec 1, 2021	Representation LearningVideo Recognition	CodeCode Available	1
End-to-End Referring Video Object Segmentation with Multimodal Transformers	Nov 29, 2021	Inductive BiasInstance Segmentation	CodeCode Available	1
SwinBERT: End-to-End Transformers with Sparse Attention for Video Captioning	Nov 25, 2021	Caption GenerationQuestion Answering	CodeCode Available	1
MM-Pyramid: Multimodal Pyramid Attentional Network for Audio-Visual Event Localization and Video Parsing	Nov 24, 2021	audio-visual event localizationVideo Understanding	CodeCode Available	1
VIOLET : End-to-End Video-Language Transformers with Masked Visual-token Modeling	Nov 24, 2021	Question AnsweringRetrieval	CodeCode Available	1
Revisiting spatio-temporal layouts for compositional action recognition	Nov 2, 2021	Action ClassificationAction Detection	CodeCode Available	1
Relational Self-Attention: What's Missing in Attention for Video Understanding	Nov 2, 2021	Action RecognitionTemporal Action Localization	CodeCode Available	1
Benchmarking the Robustness of Spatial-Temporal Models Against Corruptions	Oct 13, 2021	BenchmarkingComputational Efficiency	CodeCode Available	1
Object-Region Video Transformers	Oct 13, 2021	Action DetectionAction Recognition	CodeCode Available	1
Learning Temporally Causal Latent Processes from General Temporal Data	Oct 11, 2021	Causal DiscoveryRepresentation Learning	CodeCode Available	1
IntentVizor: Towards Generic Query Guided Interactive Video Summarization	Sep 30, 2021	Video SummarizationVideo Understanding	CodeCode Available	1
Learning Temporally Latent Causal Processes from General Temporal Data	Sep 29, 2021	Causal DiscoveryDisentanglement	CodeCode Available	1
Towards High-Quality Temporal Action Detection with Sparse Proposals	Sep 18, 2021	Action DetectionAvg	CodeCode Available	1
Foreground-Action Consistency Network for Weakly Supervised Temporal Action Localization	Aug 14, 2021	Action LocalizationMultiple Instance Learning	CodeCode Available	1
AutoVideo: An Automated Video Action Recognition System	Aug 9, 2021	Action RecognitionAutoML	CodeCode Available	1
Token Shift Transformer for Video Classification	Aug 5, 2021	ClassificationComputational Efficiency	CodeCode Available	1
Elaborative Rehearsal for Zero-shot Action Recognition	Aug 5, 2021	Action RecognitionFew-Shot Learning	CodeCode Available	1
Enhancing Self-supervised Video Representation Learning via Multi-level Feature Optimization	Aug 4, 2021	Contrastive LearningRepresentation Learning	CodeCode Available	1
Spatial-Temporal Transformer for Dynamic Scene Graph Generation	Jul 26, 2021	DecoderScene Graph Generation	CodeCode Available	1
Disentangle Your Dense Object Detector	Jul 7, 2021	DisentanglementObject	CodeCode Available	1
Feature Combination Meets Attention: Baidu Soccer Embeddings and Transformer based Temporal Detection	Jun 28, 2021	Action RecognitionAction Spotting	CodeCode Available	1
Can An Image Classifier Suffice For Action Recognition?	Jun 26, 2021	Action Recognitionimage-classification	CodeCode Available	1
Towards Long-Form Video Understanding	Jun 21, 2021	Action RecognitionForm	CodeCode Available	1
TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?	Jun 21, 2021	Action ClassificationImage Classification	CodeCode Available	1
VIMPAC: Video Pre-Training via Masked Token Prediction and Contrastive Learning	Jun 21, 2021	Action ClassificationAction Recognition	CodeCode Available	1

Show:10 25 50

← PrevPage 7 of 23Next →

No leaderboard results yet.