SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 926950 of 1149 papers

TitleStatusHype
Memory-Guided Semantic Learning Network for Temporal Sentence Grounding0
VRDFormer: End-to-End Video Visual Relation Detection With Transformers0
YouMVOS: An Actor-Centric Multi-Shot Video Object Segmentation Dataset0
Improving Video Model Transfer With Dynamic Representation Learning0
UBoCo: Unsupervised Boundary Contrastive Learning for Generic Event Boundary Detection0
Recurring the Transformer for Video Action Recognition0
Exploiting Long-Term Dependencies for Generating Dynamic Scene GraphsCode0
Discrete neural representations for explainable anomaly detection0
Auto-X3D: Ultra-Efficient Video Understanding via Finer-Grained Neural Architecture Search0
Suppressing Static Visual Cues via Normalizing Flows for Self-Supervised Video Representation LearningCode0
Stacked Temporal Attention: Improving First-person Action Recognition by Emphasizing Discriminative Clips0
LiVLR: A Lightweight Visual-Linguistic Reasoning Framework for Video Question Answering0
UBoCo : Unsupervised Boundary Contrastive Learning for Generic Event Boundary Detection0
Fill-in-the-Blank: A Challenging Video Understanding Evaluation Framework0
Occluded Video Instance Segmentation: Dataset and ICCV 2021 Challenge0
Re-ID-AR: Improved Person Re-identification in Video via Joint Weakly Supervised Action RecognitionCode0
Gradient Frequency Modulation for Visually Explaining Video Understanding Models0
Distantly Supervised Semantic Text Detection and Recognition for Broadcast Sports Videos Understanding0
Leveraging Local Temporal Information for Multimodal Scene Classification0
Can't Fool Me: Adversarially Robust Transformer for Video Understanding0
NoisyActions2M: A Multimedia Dataset for Video Understanding from Noisy LabelsCode0
CLIP4Caption: CLIP for Video Caption0
TAda! Temporally-Adaptive Convolutions for Video UnderstandingCode0
Toward a Human-Level Video Understanding Intelligence0
Efficient Modelling Across Time of Human Actions and Interactions0
Show:102550
← PrevPage 38 of 46Next →

No leaderboard results yet.