SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 901950 of 1149 papers

TitleStatusHype
Enhancing Self-supervised Video Representation Learning via Multi-level Feature OptimizationCode1
Spatial-Temporal Transformer for Dynamic Scene Graph GenerationCode1
CogME: A Cognition-Inspired Multi-Dimensional Evaluation Metric for Story Understanding0
Disentangle Your Dense Object DetectorCode1
Spatio-Temporal Context for Action Detection0
Feature Combination Meets Attention: Baidu Soccer Embeddings and Transformer based Temporal DetectionCode1
Can An Image Classifier Suffice For Action Recognition?Code1
Video Swin TransformerCode2
TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?Code1
Towards Long-Form Video UnderstandingCode1
VIMPAC: Video Pre-Training via Masked Token Prediction and Contrastive LearningCode1
NExT-QA: Next Phase of Question-Answering to Explaining Temporal ActionsCode1
Learning the Predictability of the FutureCode1
Discerning Generic Event Boundaries in Long-Form Wild Videos0
End-to-end Temporal Action Detection with TransformerCode1
Long-Short Temporal Contrastive Learning of Video Transformers0
C^3: Compositional Counterfactual Contrastive Learning for Video-grounded Dialogues0
Isolated Sign Recognition from RGB Video using Pose Flow and Self-AttentionCode1
VT-SSum: A Benchmark Dataset for Video Transcript Segmentation and SummarizationCode1
Towards Training Stronger Video Vision Transformers for EPIC-KITCHENS-100 Action Recognition0
Learning Dynamics via Graph Neural Networks for Human Pose Estimation and Tracking0
Technical Report: Temporal Aggregate RepresentationsCode1
Transformed ROIs for Capturing Visual Transformations in Videos0
A Study On the Effects of Pre-processing On Spatio-temporal Action Recognition Using Spiking Neural Networks Trained with STDP0
Highlight Timestamp Detection Model for Comedy Videos via Multimodal Sentiment Analysis0
FineAction: A Fine-Grained Video Dataset for Temporal Action LocalizationCode1
VLM: Task-agnostic Video-Language Model Pre-training for Video Understanding0
NExT-QA:Next Phase of Question-Answering to Explaining Temporal ActionsCode1
MultiSports: A Multi-Person Video Dataset of Spatio-Temporally Localized Sports ActionsCode1
Relation-aware Hierarchical Attention Framework for Video Question AnsweringCode0
Spoken Moments: Learning Joint Audio-Visual Representations from Video Descriptions0
Stochastic Image-to-Video Synthesis using cINNsCode1
FrameExit: Conditional Early Exiting for Efficient Video RecognitionCode1
Skimming and Scanning for Untrimmed Video Action Recognition0
Temporal Query Networks for Fine-grained Video Understanding0
Camera Calibration and Player Localization in SoccerNet-v2 and Investigation of their Representations for Action Spotting0
CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip RetrievalCode1
Temporally smooth online action detection using cycle-consistent future anticipationCode0
Adaptive Intermediate Representations for Video Understanding0
Crossover Learning for Fast Online Video Instance SegmentationCode1
Unidentified Video Objects: A Benchmark for Dense, Open-World Segmentation0
FIBER: Fill-in-the-Blanks as a Challenging Video Understanding Evaluation FrameworkCode0
TubeR: Tubelet Transformer for Video Action DetectionCode1
M3L: Language-based Video Editing via Multi-Modal Multi-Level Transformers0
Visual Semantic Role Labeling for Video UnderstandingCode1
Augmented Transformer with Adaptive Graph for Temporal Action Proposal Generation0
Unified Graph Structured Models for Video Understanding0
Low-Fidelity End-to-End Video Encoder Pre-training for Temporal Action Localization0
Learning Salient Boundary Feature for Anchor-free Temporal Action LocalizationCode1
Temporal Context Aggregation Network for Temporal Action Proposal RefinementCode1
Show:102550
← PrevPage 19 of 23Next →

No leaderboard results yet.