Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 801–850 of 1149 papers

Title	Date	Tasks	Status
Vamos: Versatile Action Models for Video Understanding	Nov 22, 2023	EgoSchemaHard Attention	CodeCode Available
SPOT! Revisiting Video-Language Models for Event Understanding	Nov 21, 2023	AttributeVideo Understanding	—Unverified
ProBio: A Protocol-guided Multimodal Dataset for Molecular Biology Lab	Nov 1, 2023	Action RecognitionVideo Understanding	—Unverified
ZEETAD: Adapting Pretrained Vision-Language Model for Zero-Shot End-to-End Temporal Action Detection	Nov 1, 2023	Action DetectionClassification	—Unverified
Beyond still images: Temporal features and input variance resilience	Nov 1, 2023	Video Understanding	—Unverified
Videoprompter: an ensemble of foundational models for zero-shot video understanding	Oct 23, 2023	Action RecognitionDescriptive	—Unverified
Query-aware Long Video Localization and Relation Discrimination for Deep Video Understanding	Oct 19, 2023	RelationVideo Understanding	—Unverified
Analyzing Zero-Shot Abilities of Vision-Language Models on Video Understanding Tasks	Oct 7, 2023	Action RecognitionMultiple-choice	—Unverified
DriveGPT4: Interpretable End-to-end Autonomous Driving via Large Language Model	Oct 2, 2023	Autonomous DrivingLanguage Modeling	—Unverified
Telling Stories for Common Sense Zero-Shot Action Recognition	Sep 29, 2023	Action RecognitionArticles	CodeCode Available
M^33D: Learning 3D priors using Multi-Modal Masked Autoencoders for 2D image and video understanding	Sep 26, 2023	2D Semantic SegmentationAction Detection	—Unverified
Towards Surveillance Video-and-Language Understanding: New Dataset, Baselines, and Challenges	Sep 25, 2023	Anomaly DetectionDense Video Captioning	—Unverified
Revisiting Kernel Temporal Segmentation as an Adaptive Tokenizer for Long-form Video Understanding	Sep 20, 2023	Action LocalizationForm	—Unverified
Learning Dynamic MRI Reconstruction with Convolutional Network Assisted Reconstruction Swin Transformer	Sep 19, 2023	AnatomyComputational Efficiency	—Unverified
Language as the Medium: Multimodal Video Classification through text only	Sep 19, 2023	Action RecognitionVideo Classification	—Unverified
Judging a video by its bitstream cover	Sep 14, 2023	Video Understanding	CodeCode Available
Motion-Guided Masking for Spatiotemporal Representation Learning	Aug 24, 2023	Domain AdaptationRepresentation Learning	—Unverified
MOFO: MOtion FOcused Self-Supervision for Video Understanding	Aug 23, 2023	Action ClassificationAction Recognition	CodeCode Available
Are current long-term video understanding datasets long-term?	Aug 22, 2023	Action RecognitionVideo Understanding	CodeCode Available
Audio-Visual Glance Network for Efficient Video Recognition	Aug 18, 2023	Video RecognitionVideo Understanding	—Unverified
Temporally-Adaptive Models for Efficient Video Understanding	Aug 10, 2023	Action ClassificationAction Recognition	—Unverified
M^3Net: Multi-view Encoding, Matching, and Fusion for Few-shot Fine-grained Action Recognition	Aug 6, 2023	Action RecognitionDecision Making	—Unverified
DPMix: Mixture of Depth and Point Cloud Video Experts for 4D Action Segmentation	Jul 31, 2023	Action SegmentationHuman-Object Interaction Detection	—Unverified
InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation	Jul 13, 2023	Action RecognitionContrastive Learning	—Unverified
HA-ViD: A Human Assembly Video Dataset for Comprehensive Assembly Knowledge Understanding	Jul 9, 2023	Action RecognitionAction Segmentation	CodeCode Available
VideoGLUE: Video General Understanding Evaluation of Foundation Models	Jul 6, 2023	Action RecognitionTemporal Localization	—Unverified
ICSVR: Investigating Compositional and Syntactic Understanding in Video Retrieval Models	Jun 28, 2023	RetrievalVideo Retrieval	CodeCode Available
Temporal Action Proposal Generation With Action Frequency Adaptive Network	Jun 23, 2023	Knowledge DistillationTemporal Action Proposal Generation	CodeCode Available
Learning Space-Time Semantic Correspondences	Jun 16, 2023	Imitation LearningSemantic correspondence	—Unverified
Learning Fine-grained View-Invariant Representations from Unpaired Ego-Exo Videos via Temporal Alignment	Jun 8, 2023	Video Understanding	—Unverified
MoviePuzzle: Visual Narrative Reasoning through Multimodal Order Learning	Jun 4, 2023	BenchmarkingContrastive Learning	—Unverified
Teacher Agent: A Knowledge Distillation-Free Framework for Rehearsal-based Video Incremental Learning	Jun 1, 2023	Incremental LearningKnowledge Distillation	CodeCode Available
Action Sensitivity Learning for Temporal Action Localization	May 25, 2023	Action LocalizationMoment Queries	—Unverified
Learning Higher-order Object Interactions for Keypoint-based Video Understanding	May 16, 2023	Action LocalizationAction Recognition	—Unverified
A Video Is Worth 4096 Tokens: Verbalize Videos To Understand Them In Zero Shot	May 16, 2023	Emotion ClassificationQuestion Answering	CodeCode Available
Vehicle Detection and Classification without Residual Calculation: Accelerating HEVC Image Decoding with Random Perturbation Injection	May 14, 2023	Image Reconstructionvehicle detection	—Unverified
ChatVideo: A Tracklet-centric Multimodal and Versatile Video Understanding System	Apr 27, 2023	Video Understanding	—Unverified
MRSN: Multi-Relation Support Network for Video Action Detection	Apr 24, 2023	Action DetectionRelation	—Unverified
Search-Map-Search: A Frame Selection Paradigm for Action Recognition	Apr 20, 2023	Action RecognitionHeuristic Search	—Unverified
LASER: A Neuro-Symbolic Framework for Learning Spatial-Temporal Scene Graphs with Weak Supervision	Apr 15, 2023	Language ModelingLanguage Modelling	—Unverified
Therbligs in Action: Video Understanding through Motion Primitives	Apr 6, 2023	Action AnticipationAction Recognition	—Unverified
DOAD: Decoupled One Stage Action Detection Network	Apr 1, 2023	Action DetectionAction Recognition	—Unverified
SVT: Supertoken Video Transformer for Efficient Video Understanding	Apr 1, 2023	Video Understanding	—Unverified
System-status-aware Adaptive Network for Online Streaming Video Understanding	Mar 28, 2023	Streaming video understandingVideo Understanding	—Unverified
Selective Structured State-Spaces for Long-Form Video Understanding	Mar 25, 2023	Contrastive LearningForm	—Unverified
Leaping Into Memories: Space-Time Deep Feature Synthesis	Mar 17, 2023	DiversityVideo Understanding	CodeCode Available
Video4MRI: An Empirical Study on Brain Magnetic Resonance Image Analytics with CNN-based Video Classification Frameworks	Feb 24, 2023	ClassificationData Augmentation	—Unverified
MINOTAUR: Multi-task Video Grounding From Multimodal Queries	Feb 16, 2023	Action DetectionSentence	CodeCode Available
Semi-Parametric Video-Grounded Text Generation	Jan 27, 2023	Language ModelingLanguage Modelling	—Unverified
Building Scalable Video Understanding Benchmarks through Sports	Jan 17, 2023	Video Understanding	—Unverified

Show:10 25 50

← PrevPage 17 of 23Next →

No leaderboard results yet.