SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 201250 of 1149 papers

TitleStatusHype
Is Appearance Free Action Recognition Possible?Code1
MotionSqueeze: Neural Motion Feature Learning for Video UnderstandingCode1
CyberV: Cybernetics for Test-time Scaling in Video UnderstandingCode1
DeepSportradar-v1: Computer Vision Dataset for Sports Understanding with High Quality AnnotationsCode1
Isolated Sign Recognition from RGB Video using Pose Flow and Self-AttentionCode1
Learning Video Context as Interleaved Multimodal SequencesCode1
IntentVizor: Towards Generic Query Guided Interactive Video SummarizationCode1
InternVideo-Ego4D: A Pack of Champion Solutions to Ego4D ChallengesCode1
Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction TuningCode1
Agentic Keyframe Search for Video Question AnsweringCode1
IV-Bench: A Benchmark for Image-Grounded Video Perception and Reasoning in Multimodal LLMsCode1
DEVIAS: Learning Disentangled Video Representations of Action and SceneCode1
Localizing Moments in Long Video Via Multimodal GuidanceCode1
ReTaKe: Reducing Temporal and Knowledge Redundancy for Long Video UnderstandingCode1
AlanaVLM: A Multimodal Embodied AI Foundation Model for Egocentric Video UnderstandingCode1
Learning Temporally Causal Latent Processes from General Temporal DataCode1
AVS-Mamba: Exploring Temporal and Multi-modal Mamba for Audio-Visual SegmentationCode1
Disentangle Your Dense Object DetectorCode1
SMILE: Multimodal Dataset for Understanding Laughter in Video with Language ModelsCode1
Crossover Learning for Fast Online Video Instance SegmentationCode1
Hybrid-Level Instruction Injection for Video Token Compression in Multi-modal Large Language ModelsCode1
Lightweight Network Architecture for Real-Time Action RecognitionCode1
How Well Does GPT-4V(ision) Adapt to Distribution Shifts? A Preliminary InvestigationCode1
Hypergraph Multi-modal Large Language Model: Exploiting EEG and Eye-tracking Modalities to Evaluate Heterogeneous Responses for Video UnderstandingCode1
Does Your Vision-Language Model Get Lost in the Long Video Sampling Dilemma?Code1
Do Language Models Understand Time?Code1
PhysGame: Uncovering Physical Commonsense Violations in Gameplay VideosCode1
Large Scale Holistic Video UnderstandingCode1
How Severe is Benchmark-Sensitivity in Video Self-Supervised Learning?Code1
Contrastive Spatio-Temporal Pretext Learning for Self-supervised Video RepresentationCode1
Contrastive Masked Autoencoders for Self-Supervised Video HashingCode1
LoVR: A Benchmark for Long Video Retrieval in Multimodal ContextsCode1
Helping Hands: An Object-Aware Ego-Centric Video Recognition ModelCode1
Revisiting spatio-temporal layouts for compositional action recognitionCode1
PAN: Towards Fast Action Recognition via Learning Persistence of AppearanceCode1
Hier-EgoPack: Hierarchical Egocentric Video Understanding with Diverse Task PerspectivesCode1
HAT: History-Augmented Anchor Transformer for Online Temporal Action LocalizationCode1
Event-Free Moving Object Segmentation from Moving Ego VehicleCode1
Panoramic Vision Transformer for Saliency Detection in 360° VideosCode1
Dual-path Adaptation from Image to Video TransformersCode1
MH-DETR: Video Moment and Highlight Detection with Cross-modal TransformerCode1
MECD+: Unlocking Event-Level Causal Graph Discovery for Video ReasoningCode1
A Simple and Efficient Pipeline to Build an End-to-End Spatial-Temporal Action DetectorCode1
ST-Adapter: Parameter-Efficient Image-to-Video Transfer LearningCode1
MMAD: Multi-label Micro-Action Detection in VideosCode1
Grounded Question-Answering in Long Egocentric VideosCode1
InfiniBench: A Comprehensive Benchmark for Large Multimodal Models in Very Long Video UnderstandingCode1
Panoptic Video Scene Graph GenerationCode1
PAVE: Patching and Adapting Video Large Language ModelsCode1
Point Primitive Transformer for Long-Term 4D Point Cloud Video UnderstandingCode1
Show:102550
← PrevPage 5 of 23Next →

No leaderboard results yet.