Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 901–950 of 1149 papers

Title	Date	Tasks	Status
Bringing Image Scene Structure to Video via Frame-Clip Consistency of Object Tokens	Jun 13, 2022	Action RecognitionVideo Understanding	—Unverified
Efficient Annotation and Learning for 3D Hand Pose Estimation: A Survey	Jun 5, 2022	3D Hand Pose EstimationDomain Adaptation	—Unverified
Development of a MultiModal Annotation Framework and Dataset for Deep Video Understanding	Jun 1, 2022	Knowledge GraphsVideo Understanding	—Unverified
i-Code: An Integrative and Composable Multimodal Learning Framework	May 3, 2022	Contrastive LearningVideo Understanding	—Unverified
Overview of the MedVidQA 2022 Shared Task on Medical Video Question-Answering	May 1, 2022	Question AnsweringVideo Classification	—Unverified
Contrastive Language-Action Pre-training for Temporal Localization	Apr 26, 2022	Action LocalizationContrastive Learning	—Unverified
Causal Reasoning Meets Visual Representation Learning: A Prospective Study	Apr 26, 2022	BenchmarkingOut-of-Distribution Generalization	—Unverified
Revealing Occlusions with 4D Neural Fields	Apr 22, 2022	Video Understanding	—Unverified
Less than Few: Self-Shot Video Instance Segmentation	Apr 19, 2022	Few-Shot LearningInstance Segmentation	—Unverified
ActAR: Actor-Driven Pose Embeddings for Video Action Recognition	Apr 19, 2022	Action RecognitionOptical Flow Estimation	—Unverified
Adversarial Machine Learning Attacks Against Video Anomaly Detection Systems	Apr 7, 2022	Anomaly DetectionBIG-bench Machine Learning	—Unverified
MM-SEAL: A Large-scale Video Dataset of Multi-person Multi-grained Spatio-temporally Action Localization	Apr 6, 2022	Action LocalizationAction Recognition	—Unverified
PYSKL: a toolbox for skeleton-based video understanding	Apr 2, 2022	Skeleton Based Action RecognitionVideo Understanding	—Unverified
FitCLIP: Refining Large-Scale Pretrained Image-Text Models for Zero-Shot Video Understanding Tasks	Mar 24, 2022	Action RecognitionRetrieval	CodeCode Available
On the Pitfalls of Batch Normalization for End-to-End Video Learning: A Study on Surgical Workflow Analysis	Mar 15, 2022	Video Understanding	CodeCode Available
Human Gaze Guided Attention for Surgical Activity Recognition	Mar 9, 2022	Activity RecognitionVideo Understanding	—Unverified
Multi-Scale Self-Contrastive Learning with Hard Negative Mining for Weakly-Supervised Query-based Video Grounding	Mar 8, 2022	Contrastive LearningSentence	—Unverified
Temporal Perceiver: A General Architecture for Arbitrary Boundary Detection	Mar 1, 2022	AvgBoundary Detection	—Unverified
Concept Graph Neural Networks for Surgical Video Understanding	Feb 27, 2022	Video Understanding	—Unverified
Audio Visual Scene-Aware Dialog Generation with Transformer-based Video Representations	Feb 21, 2022	Answer GenerationVideo Understanding	—Unverified
A Coding Framework and Benchmark towards Low-Bitrate Video Understanding	Feb 6, 2022	Video CompressionVideo Understanding	CodeCode Available
Capturing Temporal Information in a Single Frame: Channel Sampling Strategies for Action Recognition	Jan 25, 2022	Action RecognitionOptical Flow Estimation	CodeCode Available
End-to-end Generative Pretraining for Multimodal Video Captioning	Jan 20, 2022	Action ClassificationDecoder	—Unverified
Multiview Transformers for Video Recognition	Jan 12, 2022	Action ClassificationAction Recognition	—Unverified
MERLOT Reserve: Neural Script Knowledge through Vision and Language and Sound	Jan 7, 2022	Action ClassificationNavigate	—Unverified
Memory-Guided Semantic Learning Network for Temporal Sentence Grounding	Jan 3, 2022	SentenceTemporal Sentence Grounding	—Unverified
VRDFormer: End-to-End Video Visual Relation Detection With Transformers	Jan 1, 2022	ObjectRelation	—Unverified
YouMVOS: An Actor-Centric Multi-Shot Video Object Segmentation Dataset	Jan 1, 2022	ManagementSegmentation	—Unverified
Improving Video Model Transfer With Dynamic Representation Learning	Jan 1, 2022	Action ClassificationKnowledge Distillation	—Unverified
UBoCo: Unsupervised Boundary Contrastive Learning for Generic Event Boundary Detection	Jan 1, 2022	Boundary DetectionContrastive Learning	—Unverified
Recurring the Transformer for Video Action Recognition	Jan 1, 2022	Action RecognitionGPU	—Unverified
Exploiting Long-Term Dependencies for Generating Dynamic Scene Graphs	Dec 18, 2021	Graph GenerationObject	CodeCode Available
Discrete neural representations for explainable anomaly detection	Dec 10, 2021	Anomaly DetectionObject	—Unverified
Auto-X3D: Ultra-Efficient Video Understanding via Finer-Grained Neural Architecture Search	Dec 9, 2021	Neural Architecture SearchVideo Recognition	—Unverified
Suppressing Static Visual Cues via Normalizing Flows for Self-Supervised Video Representation Learning	Dec 7, 2021	Contrastive LearningRepresentation Learning	CodeCode Available
Stacked Temporal Attention: Improving First-person Action Recognition by Emphasizing Discriminative Clips	Dec 2, 2021	Action RecognitionVideo Understanding	—Unverified
LiVLR: A Lightweight Visual-Linguistic Reasoning Framework for Video Question Answering	Nov 29, 2021	DiversityQuestion Answering	—Unverified
UBoCo : Unsupervised Boundary Contrastive Learning for Generic Event Boundary Detection	Nov 29, 2021	Boundary DetectionContrastive Learning	—Unverified
Fill-in-the-Blank: A Challenging Video Understanding Evaluation Framework	Nov 16, 2021	Multiple-choiceQuestion Answering	—Unverified
Occluded Video Instance Segmentation: Dataset and ICCV 2021 Challenge	Nov 15, 2021	Instance SegmentationObject Recognition	—Unverified
Re-ID-AR: Improved Person Re-identification in Video via Joint Weakly Supervised Action Recognition	Nov 1, 2021	Action RecognitionPerson Re-Identification	CodeCode Available
Gradient Frequency Modulation for Visually Explaining Video Understanding Models	Nov 1, 2021	Action RecognitionTemporal Action Localization	—Unverified
Distantly Supervised Semantic Text Detection and Recognition for Broadcast Sports Videos Understanding	Oct 31, 2021	Action RecognitionText Detection	—Unverified
Leveraging Local Temporal Information for Multimodal Scene Classification	Oct 26, 2021	ClassificationScene Classification	—Unverified
Can't Fool Me: Adversarially Robust Transformer for Video Understanding	Oct 26, 2021	image-classificationImage Classification	—Unverified
NoisyActions2M: A Multimedia Dataset for Video Understanding from Noisy Labels	Oct 13, 2021	Action ClassificationSelf-Supervised Learning	CodeCode Available
CLIP4Caption: CLIP for Video Caption	Oct 13, 2021	DecoderSentence	—Unverified
TAda! Temporally-Adaptive Convolutions for Video Understanding	Oct 12, 2021	Action ClassificationAction Recognition	CodeCode Available
Toward a Human-Level Video Understanding Intelligence	Oct 8, 2021	AI AgentVideo Understanding	—Unverified
Efficient Modelling Across Time of Human Actions and Interactions	Oct 5, 2021	Action RecognitionVideo Understanding	—Unverified

Show:10 25 50

← PrevPage 19 of 23Next →

No leaderboard results yet.