SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 7180 of 1149 papers

TitleStatusHype
Deep Video Discovery: Agentic Search with Tool Use for Long-form Video Understanding0
QuickVideo: Real-Time Long Video Understanding with System Algorithm Co-DesignCode2
SoccerChat: Integrating Multimodal Data for Enhanced Soccer Game UnderstandingCode0
Fact-R1: Towards Explainable Video Misinformation Detection with Deep ReasoningCode1
Four Eyes Are Better Than Two: Harnessing the Collaborative Potential of Large Models via Differentiated Thinking and Complementary Ensembles0
ViQAgent: Zero-Shot Video Question Answering via Agent with Open-Vocabulary Grounding ValidationCode0
ViaRL: Adaptive Temporal Grounding via Visual Iterated Amplification Reinforcement Learning0
Leveraging Foundation Models for Multimodal Graph-Based Action Recognition0
LiveVLM: Efficient Online Video Understanding via Streaming-Oriented KV Cache and Retrieval0
Clapper: Compact Learning and Video Representation in VLMs0
Show:102550
← PrevPage 8 of 115Next →

No leaderboard results yet.