SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 301350 of 1149 papers

TitleStatusHype
Spotting Temporally Precise, Fine-Grained Events in VideoCode1
Clover: Towards A Unified Video-Language Alignment and Fusion ModelCode1
Is Appearance Free Action Recognition Possible?Code1
Federated Self-supervised Learning for Video UnderstandingCode1
ST-Adapter: Parameter-Efficient Image-to-Video Transfer LearningCode1
REVECA -- Rich Encoder-decoder framework for Video Event CAptionerCode1
Stand-Alone Inter-Frame Attention in Video ModelsCode1
A Simple and Efficient Pipeline to Build an End-to-End Spatial-Temporal Action DetectorCode1
From Representation to Reasoning: Towards both Evidence and Commonsense Reasoning for Video Question-AnsweringCode1
Free Lunch for Surgical Video Understanding by Distilling Self-SupervisionsCode1
ETAD: Training Action Detection End to End on a LaptopCode1
BasicTAD: an Astounding RGB-Only Baseline for Temporal Action DetectionCode1
A Multi-Person Video Dataset Annotation Method of Spatio-Temporally ActionsCode1
Temporal Alignment Networks for Long-term VideoCode1
An Empirical Study of End-to-End Temporal Action DetectionCode1
Long Movie Clip Classification with State-Space Video ModelsCode1
SPAct: Self-supervised Privacy Preservation for Action RecognitionCode1
How Severe is Benchmark-Sensitivity in Video Self-Supervised Learning?Code1
Domain Knowledge-Informed Self-Supervised Representations for Workout Form AssessmentCode1
Learning Optical Flow with Adaptive Graph ReasoningCode1
A Dataset for Medical Instructional Video Classification and Question AnsweringCode1
Video Joint Modelling Based on Hierarchical Transformer for Co-summarizationCode1
Contrastive Spatio-Temporal Pretext Learning for Self-supervised Video RepresentationCode1
Progressive Attention on Multi-Level Dense Difference Maps for Generic Event Boundary DetectionCode1
Prompting Visual-Language Models for Efficient Video UnderstandingCode1
TokenLearner: Adaptive Space-Time Tokenization for VideosCode1
End-to-End Referring Video Object Segmentation with Multimodal TransformersCode1
SwinBERT: End-to-End Transformers with Sparse Attention for Video CaptioningCode1
MM-Pyramid: Multimodal Pyramid Attentional Network for Audio-Visual Event Localization and Video ParsingCode1
VIOLET : End-to-End Video-Language Transformers with Masked Visual-token ModelingCode1
Revisiting spatio-temporal layouts for compositional action recognitionCode1
Relational Self-Attention: What's Missing in Attention for Video UnderstandingCode1
Benchmarking the Robustness of Spatial-Temporal Models Against CorruptionsCode1
Object-Region Video TransformersCode1
Learning Temporally Causal Latent Processes from General Temporal DataCode1
IntentVizor: Towards Generic Query Guided Interactive Video SummarizationCode1
Learning Temporally Latent Causal Processes from General Temporal DataCode1
Towards High-Quality Temporal Action Detection with Sparse ProposalsCode1
Foreground-Action Consistency Network for Weakly Supervised Temporal Action LocalizationCode1
AutoVideo: An Automated Video Action Recognition SystemCode1
Token Shift Transformer for Video ClassificationCode1
Elaborative Rehearsal for Zero-shot Action RecognitionCode1
Enhancing Self-supervised Video Representation Learning via Multi-level Feature OptimizationCode1
Spatial-Temporal Transformer for Dynamic Scene Graph GenerationCode1
Disentangle Your Dense Object DetectorCode1
Feature Combination Meets Attention: Baidu Soccer Embeddings and Transformer based Temporal DetectionCode1
Can An Image Classifier Suffice For Action Recognition?Code1
Towards Long-Form Video UnderstandingCode1
TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?Code1
VIMPAC: Video Pre-Training via Masked Token Prediction and Contrastive LearningCode1
Show:102550
← PrevPage 7 of 23Next →

No leaderboard results yet.