SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 9511000 of 1149 papers

TitleStatusHype
Spatio-Temporal Video Representation Learning for AI Based Video Playback Style Prediction0
OBJECT DYNAMICS DISTILLATION FOR SCENE DECOMPOSITION AND REPRESENTATION0
Pairwise Emotional Relationship Recognition in Drama Videos: Dataset and BenchmarkCode0
A Multimodal Sentiment Dataset for Video Recommendation0
Overview of Tencent Multi-modal Ads Video Understanding Challenge0
Multi-modal Representation Learning for Video Advertisement Content Structuring0
Spatio-Temporal Perturbations for Video AttributionCode0
LIGAR: Lightweight General-purpose Action Recognition0
Identity-aware Graph Memory Network for Action Detection0
Learning an Augmented RGB Representation with Cross-Modal Knowledge Distillation for Action Detection0
O2NA: An Object-Oriented Non-Autoregressive Approach for Controllable Video Captioning0
CogME: A Cognition-Inspired Multi-Dimensional Evaluation Metric for Story Understanding0
Spatio-Temporal Context for Action Detection0
Discerning Generic Event Boundaries in Long-Form Wild Videos0
Long-Short Temporal Contrastive Learning of Video Transformers0
C^3: Compositional Counterfactual Contrastive Learning for Video-grounded Dialogues0
Towards Training Stronger Video Vision Transformers for EPIC-KITCHENS-100 Action Recognition0
Learning Dynamics via Graph Neural Networks for Human Pose Estimation and Tracking0
Transformed ROIs for Capturing Visual Transformations in Videos0
A Study On the Effects of Pre-processing On Spatio-temporal Action Recognition Using Spiking Neural Networks Trained with STDP0
Highlight Timestamp Detection Model for Comedy Videos via Multimodal Sentiment Analysis0
VLM: Task-agnostic Video-Language Model Pre-training for Video Understanding0
Relation-aware Hierarchical Attention Framework for Video Question AnsweringCode0
Spoken Moments: Learning Joint Audio-Visual Representations from Video Descriptions0
Skimming and Scanning for Untrimmed Video Action Recognition0
Camera Calibration and Player Localization in SoccerNet-v2 and Investigation of their Representations for Action Spotting0
Temporal Query Networks for Fine-grained Video Understanding0
Temporally smooth online action detection using cycle-consistent future anticipationCode0
Adaptive Intermediate Representations for Video Understanding0
Unidentified Video Objects: A Benchmark for Dense, Open-World Segmentation0
FIBER: Fill-in-the-Blanks as a Challenging Video Understanding Evaluation FrameworkCode0
M3L: Language-based Video Editing via Multi-Modal Multi-Level Transformers0
Augmented Transformer with Adaptive Graph for Temporal Action Proposal Generation0
Unified Graph Structured Models for Video Understanding0
Low-Fidelity End-to-End Video Encoder Pre-training for Temporal Action Localization0
ClawCraneNet: Leveraging Object-level Relation for Text-based Video Segmentation0
Enhancing Transformer for Video Understanding Using Gated Multi-Level Attention and Temporal Adversarial Training0
PcmNet: Position-Sensitive Context Modeling Network for Temporal Action Localization0
Unsupervised Motion Representation Enhanced Network for Action Recognition0
Win-Fail Action RecognitionCode0
CAG-QIL: Context-Aware Actionness Grouping via Q Imitation Learning for Online Temporal Action Localization0
Global Self-Attention Networks0
Cross-Attentional Audio-Visual Fusion for Weakly-Supervised Action Localization0
Attention Is Not Enough: Mitigating the Distribution Discrepancy in Asynchronous Multimodal Sequence Fusion0
Understanding Action Sequences based on Video Captioning for Learning-from-Observation0
t-EVA: Time-Efficient t-SNE Video Annotation0
Can Temporal Information Help with Contrastive Self-Supervised Learning?0
Cycle-Contrast for Self-Supervised Video Representation Learning0
Co-attentional Transformers for Story-Based Video Understanding0
Egok360: A 360 Egocentric Kinetic Human Activity Video Dataset0
Show:102550
← PrevPage 20 of 23Next →

No leaderboard results yet.