SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 851900 of 1149 papers

TitleStatusHype
Progressive Attention on Multi-Level Dense Difference Maps for Generic Event Boundary DetectionCode1
Auto-X3D: Ultra-Efficient Video Understanding via Finer-Grained Neural Architecture Search0
Prompting Visual-Language Models for Efficient Video UnderstandingCode1
Suppressing Static Visual Cues via Normalizing Flows for Self-Supervised Video Representation LearningCode0
Stacked Temporal Attention: Improving First-person Action Recognition by Emphasizing Discriminative Clips0
TokenLearner: Adaptive Space-Time Tokenization for VideosCode1
LiVLR: A Lightweight Visual-Linguistic Reasoning Framework for Video Question Answering0
UBoCo : Unsupervised Boundary Contrastive Learning for Generic Event Boundary Detection0
End-to-End Referring Video Object Segmentation with Multimodal TransformersCode1
SwinBERT: End-to-End Transformers with Sparse Attention for Video CaptioningCode1
VIOLET : End-to-End Video-Language Transformers with Masked Visual-token ModelingCode1
MM-Pyramid: Multimodal Pyramid Attentional Network for Audio-Visual Event Localization and Video ParsingCode1
PyTorchVideo: A Deep Learning Library for Video UnderstandingCode2
Fill-in-the-Blank: A Challenging Video Understanding Evaluation Framework0
Occluded Video Instance Segmentation: Dataset and ICCV 2021 Challenge0
Attention Mechanisms in Computer Vision: A SurveyCode2
Relational Self-Attention: What's Missing in Attention for Video UnderstandingCode1
Revisiting spatio-temporal layouts for compositional action recognitionCode1
Re-ID-AR: Improved Person Re-identification in Video via Joint Weakly Supervised Action RecognitionCode0
Gradient Frequency Modulation for Visually Explaining Video Understanding Models0
Distantly Supervised Semantic Text Detection and Recognition for Broadcast Sports Videos Understanding0
Can't Fool Me: Adversarially Robust Transformer for Video Understanding0
Leveraging Local Temporal Information for Multimodal Scene Classification0
Benchmarking the Robustness of Spatial-Temporal Models Against CorruptionsCode1
CLIP4Caption: CLIP for Video Caption0
NoisyActions2M: A Multimedia Dataset for Video Understanding from Noisy LabelsCode0
Object-Region Video TransformersCode1
TAda! Temporally-Adaptive Convolutions for Video UnderstandingCode0
Learning Temporally Causal Latent Processes from General Temporal DataCode1
Toward a Human-Level Video Understanding Intelligence0
Efficient Modelling Across Time of Human Actions and Interactions0
Spatio-Temporal Video Representation Learning for AI Based Video Playback Style Prediction0
IntentVizor: Towards Generic Query Guided Interactive Video SummarizationCode1
OBJECT DYNAMICS DISTILLATION FOR SCENE DECOMPOSITION AND REPRESENTATION0
Learning Temporally Latent Causal Processes from General Temporal DataCode1
TSM: Temporal Shift Module for Efficient and Scalable Video Understanding on Edge DeviceCode2
Pairwise Emotional Relationship Recognition in Drama Videos: Dataset and BenchmarkCode0
Towards High-Quality Temporal Action Detection with Sparse ProposalsCode1
A Multimodal Sentiment Dataset for Video Recommendation0
Overview of Tencent Multi-modal Ads Video Understanding Challenge0
Multi-modal Representation Learning for Video Advertisement Content Structuring0
Spatio-Temporal Perturbations for Video AttributionCode0
LIGAR: Lightweight General-purpose Action Recognition0
Identity-aware Graph Memory Network for Action Detection0
Foreground-Action Consistency Network for Weakly Supervised Temporal Action LocalizationCode1
AutoVideo: An Automated Video Action Recognition SystemCode1
Learning an Augmented RGB Representation with Cross-Modal Knowledge Distillation for Action Detection0
O2NA: An Object-Oriented Non-Autoregressive Approach for Controllable Video Captioning0
Elaborative Rehearsal for Zero-shot Action RecognitionCode1
Token Shift Transformer for Video ClassificationCode1
Show:102550
← PrevPage 18 of 23Next →

No leaderboard results yet.