SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 10511075 of 1149 papers

TitleStatusHype
Temporally smooth online action detection using cycle-consistent future anticipationCode0
HA-ViD: A Human Assembly Video Dataset for Comprehensive Assembly Knowledge UnderstandingCode0
Beyond Instructional Videos: Probing for More Diverse Visual-Textual Grounding on YouTubeCode0
Temporal Action Proposal Generation With Action Frequency Adaptive NetworkCode0
A Video Is Worth 4096 Tokens: Verbalize Videos To Understand Them In Zero ShotCode0
Telling Stories for Common Sense Zero-Shot Action RecognitionCode0
Technical Report for CVPR 2022 LOVEU AQTC ChallengeCode0
Tiny Video NetworksCode0
Teacher Agent: A Knowledge Distillation-Free Framework for Rehearsal-based Video Incremental LearningCode0
Task-Aware KV Compression For Cost-Effective Long Video UnderstandingCode0
TAda! Temporally-Adaptive Convolutions for Video UnderstandingCode0
Suppressing Static Visual Cues via Normalizing Flows for Self-Supervised Video Representation LearningCode0
Submission to Generic Event Boundary Detection Challenge@CVPR 2022: Local Context Modeling and Global Boundary Decoding ApproachCode0
Streaming Detection of Queried Event StartCode0
Hallucination Mitigation Prompts Long-term Video UnderstandingCode0
Gaussian Temporal Awareness Networks for Action LocalizationCode0
FriendsQA: A New Large-Scale Deep Video Understanding Dataset with Fine-grained Topic Categorization for Story VideosCode0
Video action detection by learning graph-based spatio-temporal interactionsCode0
FitCLIP: Refining Large-Scale Pretrained Image-Text Models for Zero-Shot Video Understanding TasksCode0
Spatio-Temporal Perturbations for Video AttributionCode0
FIBER: Fill-in-the-Blanks as a Challenging Video Understanding Evaluation FrameworkCode0
SoccerNet 2024 Challenges ResultsCode0
Few-Shot Referring Relationships in VideosCode0
Towards Multimodal Video Paragraph Captioning Models Robust to Missing ModalityCode0
SoccerChat: Integrating Multimodal Data for Enhanced Soccer Game UnderstandingCode0
Show:102550
← PrevPage 43 of 46Next →

No leaderboard results yet.