SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 801850 of 1149 papers

TitleStatusHype
Vamos: Versatile Action Models for Video UnderstandingCode0
SPOT! Revisiting Video-Language Models for Event Understanding0
ProBio: A Protocol-guided Multimodal Dataset for Molecular Biology Lab0
ZEETAD: Adapting Pretrained Vision-Language Model for Zero-Shot End-to-End Temporal Action Detection0
Beyond still images: Temporal features and input variance resilience0
Videoprompter: an ensemble of foundational models for zero-shot video understanding0
Query-aware Long Video Localization and Relation Discrimination for Deep Video Understanding0
Analyzing Zero-Shot Abilities of Vision-Language Models on Video Understanding Tasks0
DriveGPT4: Interpretable End-to-end Autonomous Driving via Large Language Model0
Telling Stories for Common Sense Zero-Shot Action RecognitionCode0
M^33D: Learning 3D priors using Multi-Modal Masked Autoencoders for 2D image and video understanding0
Towards Surveillance Video-and-Language Understanding: New Dataset, Baselines, and Challenges0
Revisiting Kernel Temporal Segmentation as an Adaptive Tokenizer for Long-form Video Understanding0
Learning Dynamic MRI Reconstruction with Convolutional Network Assisted Reconstruction Swin Transformer0
Language as the Medium: Multimodal Video Classification through text only0
Judging a video by its bitstream coverCode0
Motion-Guided Masking for Spatiotemporal Representation Learning0
MOFO: MOtion FOcused Self-Supervision for Video UnderstandingCode0
Are current long-term video understanding datasets long-term?Code0
Audio-Visual Glance Network for Efficient Video Recognition0
Temporally-Adaptive Models for Efficient Video Understanding0
M^3Net: Multi-view Encoding, Matching, and Fusion for Few-shot Fine-grained Action Recognition0
DPMix: Mixture of Depth and Point Cloud Video Experts for 4D Action Segmentation0
InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation0
HA-ViD: A Human Assembly Video Dataset for Comprehensive Assembly Knowledge UnderstandingCode0
VideoGLUE: Video General Understanding Evaluation of Foundation Models0
ICSVR: Investigating Compositional and Syntactic Understanding in Video Retrieval ModelsCode0
Temporal Action Proposal Generation With Action Frequency Adaptive NetworkCode0
Learning Space-Time Semantic Correspondences0
Learning Fine-grained View-Invariant Representations from Unpaired Ego-Exo Videos via Temporal Alignment0
MoviePuzzle: Visual Narrative Reasoning through Multimodal Order Learning0
Teacher Agent: A Knowledge Distillation-Free Framework for Rehearsal-based Video Incremental LearningCode0
Action Sensitivity Learning for Temporal Action Localization0
Learning Higher-order Object Interactions for Keypoint-based Video Understanding0
A Video Is Worth 4096 Tokens: Verbalize Videos To Understand Them In Zero ShotCode0
Vehicle Detection and Classification without Residual Calculation: Accelerating HEVC Image Decoding with Random Perturbation Injection0
ChatVideo: A Tracklet-centric Multimodal and Versatile Video Understanding System0
MRSN: Multi-Relation Support Network for Video Action Detection0
Search-Map-Search: A Frame Selection Paradigm for Action Recognition0
LASER: A Neuro-Symbolic Framework for Learning Spatial-Temporal Scene Graphs with Weak Supervision0
Therbligs in Action: Video Understanding through Motion Primitives0
DOAD: Decoupled One Stage Action Detection Network0
SVT: Supertoken Video Transformer for Efficient Video Understanding0
System-status-aware Adaptive Network for Online Streaming Video Understanding0
Selective Structured State-Spaces for Long-Form Video Understanding0
Leaping Into Memories: Space-Time Deep Feature SynthesisCode0
Video4MRI: An Empirical Study on Brain Magnetic Resonance Image Analytics with CNN-based Video Classification Frameworks0
MINOTAUR: Multi-task Video Grounding From Multimodal QueriesCode0
Semi-Parametric Video-Grounded Text Generation0
Building Scalable Video Understanding Benchmarks through Sports0
Show:102550
← PrevPage 17 of 23Next →

No leaderboard results yet.