SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 526550 of 1149 papers

TitleStatusHype
How to Make a BLT Sandwich? Learning to Reason towards Understanding Web Instructional Videos0
Hierarchical Video Frame Sequence Representation with Deep Convolutional Graph Network0
MM-Ego: Towards Building Egocentric Multimodal LLMs0
How Can Objects Help Video-Language Understanding?0
H-MBA: Hierarchical MamBa Adaptation for Multi-Modal Video Understanding in Autonomous Driving0
HLVU : A New Challenge to Test Deep Understanding of Movies the Way Humans do0
Highlight Timestamp Detection Model for Comedy Videos via Multimodal Sentiment Analysis0
Deep Video Discovery: Agentic Search with Tool Use for Long-form Video Understanding0
Auto-X3D: Ultra-Efficient Video Understanding via Finer-Grained Neural Architecture Search0
MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning0
HIG: Hierarchical Interlacement Graph Approach to Scene Graph Generation in Video Understanding0
Deep Spatio-Temporal Random Fields for Efficient Video Segmentation0
HierarQ: Task-Aware Hierarchical Q-Former for Enhanced Video Understanding0
Auto-captions on GIF: A Large-scale Video-sentence Dataset for Vision-language Pre-training0
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark0
Hierarchical Action Recognition: A Contrastive Video-Language Approach with Hierarchical Interactions0
HFGCN:Hypergraph Fusion Graph Convolutional Networks for Skeleton-Based Action Recognition0
Deep learning for action spotting in association football videos0
HENASY: Learning to Assemble Scene-Entities for Egocentric Video-Language Model0
DANTE-AD: Dual-Vision Attention Network for Long-Term Audio Description0
Action Reimagined: Text-to-Pose Video Editing for Dynamic Human Actions0
Cycle-Contrast for Self-Supervised Video Representation Learning0
A Unified Model for Video Understanding and Knowledge Embedding with Heterogeneous Knowledge Graph Dataset0
HAVANA: Hierarchical stochastic neighbor embedding for Accelerated Video ANnotAtions0
Aggregating Frame-level Features for Large-Scale Video Classification0
Show:102550
← PrevPage 22 of 46Next →

No leaderboard results yet.