Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 851–900 of 1149 papers

Title	Date	Tasks	Status	Hype
Progressive Attention on Multi-Level Dense Difference Maps for Generic Event Boundary Detection	Dec 9, 2021	Boundary DetectionDiversity	CodeCode Available	1
Auto-X3D: Ultra-Efficient Video Understanding via Finer-Grained Neural Architecture Search	Dec 9, 2021	Neural Architecture SearchVideo Recognition	—Unverified	0
Prompting Visual-Language Models for Efficient Video Understanding	Dec 8, 2021	Action RecognitionLanguage Modelling	CodeCode Available	1
Suppressing Static Visual Cues via Normalizing Flows for Self-Supervised Video Representation Learning	Dec 7, 2021	Contrastive LearningRepresentation Learning	CodeCode Available	0
Stacked Temporal Attention: Improving First-person Action Recognition by Emphasizing Discriminative Clips	Dec 2, 2021	Action RecognitionVideo Understanding	—Unverified	0
TokenLearner: Adaptive Space-Time Tokenization for Videos	Dec 1, 2021	Representation LearningVideo Recognition	CodeCode Available	1
LiVLR: A Lightweight Visual-Linguistic Reasoning Framework for Video Question Answering	Nov 29, 2021	DiversityQuestion Answering	—Unverified	0
UBoCo : Unsupervised Boundary Contrastive Learning for Generic Event Boundary Detection	Nov 29, 2021	Boundary DetectionContrastive Learning	—Unverified	0
End-to-End Referring Video Object Segmentation with Multimodal Transformers	Nov 29, 2021	Inductive BiasInstance Segmentation	CodeCode Available	1
SwinBERT: End-to-End Transformers with Sparse Attention for Video Captioning	Nov 25, 2021	Caption GenerationQuestion Answering	CodeCode Available	1
VIOLET : End-to-End Video-Language Transformers with Masked Visual-token Modeling	Nov 24, 2021	Question AnsweringRetrieval	CodeCode Available	1
MM-Pyramid: Multimodal Pyramid Attentional Network for Audio-Visual Event Localization and Video Parsing	Nov 24, 2021	audio-visual event localizationVideo Understanding	CodeCode Available	1
PyTorchVideo: A Deep Learning Library for Video Understanding	Nov 18, 2021	Deep LearningSelf-Supervised Learning	CodeCode Available	2
Fill-in-the-Blank: A Challenging Video Understanding Evaluation Framework	Nov 16, 2021	Multiple-choiceQuestion Answering	—Unverified	0
Occluded Video Instance Segmentation: Dataset and ICCV 2021 Challenge	Nov 15, 2021	Instance SegmentationObject Recognition	—Unverified	0
Attention Mechanisms in Computer Vision: A Survey	Nov 15, 2021	image-classificationImage Classification	CodeCode Available	2
Relational Self-Attention: What's Missing in Attention for Video Understanding	Nov 2, 2021	Action RecognitionTemporal Action Localization	CodeCode Available	1
Revisiting spatio-temporal layouts for compositional action recognition	Nov 2, 2021	Action ClassificationAction Detection	CodeCode Available	1
Re-ID-AR: Improved Person Re-identification in Video via Joint Weakly Supervised Action Recognition	Nov 1, 2021	Action RecognitionPerson Re-Identification	CodeCode Available	0
Gradient Frequency Modulation for Visually Explaining Video Understanding Models	Nov 1, 2021	Action RecognitionTemporal Action Localization	—Unverified	0
Distantly Supervised Semantic Text Detection and Recognition for Broadcast Sports Videos Understanding	Oct 31, 2021	Action RecognitionText Detection	—Unverified	0
Can't Fool Me: Adversarially Robust Transformer for Video Understanding	Oct 26, 2021	image-classificationImage Classification	—Unverified	0
Leveraging Local Temporal Information for Multimodal Scene Classification	Oct 26, 2021	ClassificationScene Classification	—Unverified	0
Benchmarking the Robustness of Spatial-Temporal Models Against Corruptions	Oct 13, 2021	BenchmarkingComputational Efficiency	CodeCode Available	1
CLIP4Caption: CLIP for Video Caption	Oct 13, 2021	DecoderSentence	—Unverified	0
NoisyActions2M: A Multimedia Dataset for Video Understanding from Noisy Labels	Oct 13, 2021	Action ClassificationSelf-Supervised Learning	CodeCode Available	0
Object-Region Video Transformers	Oct 13, 2021	Action DetectionAction Recognition	CodeCode Available	1
TAda! Temporally-Adaptive Convolutions for Video Understanding	Oct 12, 2021	Action ClassificationAction Recognition	CodeCode Available	0
Learning Temporally Causal Latent Processes from General Temporal Data	Oct 11, 2021	Causal DiscoveryRepresentation Learning	CodeCode Available	1
Toward a Human-Level Video Understanding Intelligence	Oct 8, 2021	AI AgentVideo Understanding	—Unverified	0
Efficient Modelling Across Time of Human Actions and Interactions	Oct 5, 2021	Action RecognitionVideo Understanding	—Unverified	0
Spatio-Temporal Video Representation Learning for AI Based Video Playback Style Prediction	Oct 3, 2021	Action RecognitionRepresentation Learning	—Unverified	0
IntentVizor: Towards Generic Query Guided Interactive Video Summarization	Sep 30, 2021	Video SummarizationVideo Understanding	CodeCode Available	1
OBJECT DYNAMICS DISTILLATION FOR SCENE DECOMPOSITION AND REPRESENTATION	Sep 29, 2021	ObjectPredict Future Video Frames	—Unverified	0
Learning Temporally Latent Causal Processes from General Temporal Data	Sep 29, 2021	Causal DiscoveryDisentanglement	CodeCode Available	1
TSM: Temporal Shift Module for Efficient and Scalable Video Understanding on Edge Device	Sep 27, 2021	Video RecognitionVideo Understanding	CodeCode Available	2
Pairwise Emotional Relationship Recognition in Drama Videos: Dataset and Benchmark	Sep 23, 2021	Video Understanding	CodeCode Available	0
Towards High-Quality Temporal Action Detection with Sparse Proposals	Sep 18, 2021	Action DetectionAvg	CodeCode Available	1
A Multimodal Sentiment Dataset for Video Recommendation	Sep 17, 2021	Multimodal Sentiment AnalysisSentiment Analysis	—Unverified	0
Overview of Tencent Multi-modal Ads Video Understanding Challenge	Sep 16, 2021	Multi-Label ClassificationMUlTI-LABEL-ClASSIFICATION	—Unverified	0
Multi-modal Representation Learning for Video Advertisement Content Structuring	Sep 4, 2021	Representation LearningRe-Ranking	—Unverified	0
Spatio-Temporal Perturbations for Video Attribution	Sep 1, 2021	Video Understanding	CodeCode Available	0
LIGAR: Lightweight General-purpose Action Recognition	Aug 30, 2021	Action RecognitionGesture Recognition	—Unverified	0
Identity-aware Graph Memory Network for Action Detection	Aug 26, 2021	Action DetectionGraph Neural Network	—Unverified	0
Foreground-Action Consistency Network for Weakly Supervised Temporal Action Localization	Aug 14, 2021	Action LocalizationMultiple Instance Learning	CodeCode Available	1
AutoVideo: An Automated Video Action Recognition System	Aug 9, 2021	Action RecognitionAutoML	CodeCode Available	1
Learning an Augmented RGB Representation with Cross-Modal Knowledge Distillation for Action Detection	Aug 8, 2021	Action DetectionKnowledge Distillation	—Unverified	0
O2NA: An Object-Oriented Non-Autoregressive Approach for Controllable Video Captioning	Aug 5, 2021	AttributeCaption Generation	—Unverified	0
Elaborative Rehearsal for Zero-shot Action Recognition	Aug 5, 2021	Action RecognitionFew-Shot Learning	CodeCode Available	1
Token Shift Transformer for Video Classification	Aug 5, 2021	ClassificationComputational Efficiency	CodeCode Available	1

Show:10 25 50

← PrevPage 18 of 23Next →

No leaderboard results yet.