SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 671680 of 1149 papers

TitleStatusHype
Attention Is Not Enough: Mitigating the Distribution Discrepancy in Asynchronous Multimodal Sequence Fusion0
Audio-Visual Glance Network for Efficient Video Recognition0
Audio-Visual LLM for Video Understanding0
Audio Visual Scene-Aware Dialog Generation with Transformer-based Video Representations0
Audio-visual training for improved grounding in video-text LLMs0
Augmented Transformer with Adaptive Graph for Temporal Action Proposal Generation0
A Unified Framework for Human-centric Point Cloud Video Understanding0
A Unified Model for Video Understanding and Knowledge Embedding with Heterogeneous Knowledge Graph Dataset0
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark0
Auto-captions on GIF: A Large-scale Video-sentence Dataset for Vision-language Pre-training0
Show:102550
← PrevPage 68 of 115Next →

No leaderboard results yet.