SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 401425 of 1149 papers

TitleStatusHype
Bootstrap Your Own Views: Masked Ego-Exo Modeling for Fine-grained View-invariant Video RepresentationsCode0
ECO: Efficient Convolutional Network for Online Video UnderstandingCode0
Towards Multimodal Video Paragraph Captioning Models Robust to Missing ModalityCode0
Tiny Video NetworksCode0
ACVUBench: Audio-Centric Video Understanding BenchmarkCode0
Beyond Raw Videos: Understanding Edited Videos with Large Multimodal ModelCode0
The Monkeytyping Solution to the YouTube-8M Video Understanding ChallengeCode0
Beyond Instructional Videos: Probing for More Diverse Visual-Textual Grounding on YouTubeCode0
The Visual Centrifuge: Model-Free Layered Video RepresentationsCode0
DriftNet: Aggressive Driving Behavior Classification using 3D EfficientNet ArchitectureCode0
DramaQA: Character-Centered Video Story Understanding with Hierarchical QACode0
Dr^2Net: Dynamic Reversible Dual-Residual Networks for Memory-Efficient FinetuningCode0
Temporal Tessellation: A Unified Approach for Video AnalysisCode0
The YouTube-8M Kaggle Competition: Challenges and MethodsCode0
Don't Judge by the Look: Towards Motion Coherent Video RepresentationCode0
Temporal Modeling Approaches for Large-scale Youtube-8M Video UnderstandingCode0
Temporal-Oriented Recipe for Transferring Large Vision-Language Model to Video UnderstandingCode0
Temporally smooth online action detection using cycle-consistent future anticipationCode0
Technical Report for CVPR 2022 LOVEU AQTC ChallengeCode0
Teacher Agent: A Knowledge Distillation-Free Framework for Rehearsal-based Video Incremental LearningCode0
Telling Stories for Common Sense Zero-Shot Action RecognitionCode0
Task-Aware KV Compression For Cost-Effective Long Video UnderstandingCode0
Temporal Action Proposal Generation With Action Frequency Adaptive NetworkCode0
Diagnosing Error in Temporal Action DetectorsCode0
A Video Is Worth 4096 Tokens: Verbalize Videos To Understand Them In Zero ShotCode0
Show:102550
← PrevPage 17 of 46Next →

No leaderboard results yet.