SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 901950 of 1149 papers

TitleStatusHype
Bringing Image Scene Structure to Video via Frame-Clip Consistency of Object Tokens0
Efficient Annotation and Learning for 3D Hand Pose Estimation: A Survey0
Development of a MultiModal Annotation Framework and Dataset for Deep Video Understanding0
i-Code: An Integrative and Composable Multimodal Learning Framework0
Overview of the MedVidQA 2022 Shared Task on Medical Video Question-Answering0
Contrastive Language-Action Pre-training for Temporal Localization0
Causal Reasoning Meets Visual Representation Learning: A Prospective Study0
Revealing Occlusions with 4D Neural Fields0
Less than Few: Self-Shot Video Instance Segmentation0
ActAR: Actor-Driven Pose Embeddings for Video Action Recognition0
Adversarial Machine Learning Attacks Against Video Anomaly Detection Systems0
MM-SEAL: A Large-scale Video Dataset of Multi-person Multi-grained Spatio-temporally Action Localization0
PYSKL: a toolbox for skeleton-based video understanding0
FitCLIP: Refining Large-Scale Pretrained Image-Text Models for Zero-Shot Video Understanding TasksCode0
On the Pitfalls of Batch Normalization for End-to-End Video Learning: A Study on Surgical Workflow AnalysisCode0
Human Gaze Guided Attention for Surgical Activity Recognition0
Multi-Scale Self-Contrastive Learning with Hard Negative Mining for Weakly-Supervised Query-based Video Grounding0
Temporal Perceiver: A General Architecture for Arbitrary Boundary Detection0
Concept Graph Neural Networks for Surgical Video Understanding0
Audio Visual Scene-Aware Dialog Generation with Transformer-based Video Representations0
A Coding Framework and Benchmark towards Low-Bitrate Video UnderstandingCode0
Capturing Temporal Information in a Single Frame: Channel Sampling Strategies for Action RecognitionCode0
End-to-end Generative Pretraining for Multimodal Video Captioning0
Multiview Transformers for Video Recognition0
MERLOT Reserve: Neural Script Knowledge through Vision and Language and Sound0
Memory-Guided Semantic Learning Network for Temporal Sentence Grounding0
VRDFormer: End-to-End Video Visual Relation Detection With Transformers0
YouMVOS: An Actor-Centric Multi-Shot Video Object Segmentation Dataset0
Improving Video Model Transfer With Dynamic Representation Learning0
UBoCo: Unsupervised Boundary Contrastive Learning for Generic Event Boundary Detection0
Recurring the Transformer for Video Action Recognition0
Exploiting Long-Term Dependencies for Generating Dynamic Scene GraphsCode0
Discrete neural representations for explainable anomaly detection0
Auto-X3D: Ultra-Efficient Video Understanding via Finer-Grained Neural Architecture Search0
Suppressing Static Visual Cues via Normalizing Flows for Self-Supervised Video Representation LearningCode0
Stacked Temporal Attention: Improving First-person Action Recognition by Emphasizing Discriminative Clips0
LiVLR: A Lightweight Visual-Linguistic Reasoning Framework for Video Question Answering0
UBoCo : Unsupervised Boundary Contrastive Learning for Generic Event Boundary Detection0
Fill-in-the-Blank: A Challenging Video Understanding Evaluation Framework0
Occluded Video Instance Segmentation: Dataset and ICCV 2021 Challenge0
Re-ID-AR: Improved Person Re-identification in Video via Joint Weakly Supervised Action RecognitionCode0
Gradient Frequency Modulation for Visually Explaining Video Understanding Models0
Distantly Supervised Semantic Text Detection and Recognition for Broadcast Sports Videos Understanding0
Leveraging Local Temporal Information for Multimodal Scene Classification0
Can't Fool Me: Adversarially Robust Transformer for Video Understanding0
NoisyActions2M: A Multimedia Dataset for Video Understanding from Noisy LabelsCode0
CLIP4Caption: CLIP for Video Caption0
TAda! Temporally-Adaptive Convolutions for Video UnderstandingCode0
Toward a Human-Level Video Understanding Intelligence0
Efficient Modelling Across Time of Human Actions and Interactions0
Show:102550
← PrevPage 19 of 23Next →

No leaderboard results yet.