SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 441450 of 1149 papers

TitleStatusHype
ScaleLong: A Multi-Timescale Benchmark for Long Video UnderstandingCode0
MaCP: Minimal yet Mighty Adaptation via Hierarchical Cosine Projection0
Multi-RAG: A Multimodal Retrieval-Augmented Generation System for Adaptive Video Understanding0
Universal Visuo-Tactile Video Understanding for Embodied Interaction0
Two Causally Related Needles in a Video Haystack0
TUNA: Comprehensive Fine-grained Temporal Understanding Evaluation on Dense Dynamic VideosCode0
AdaTP: Attention-Debiased Token Pruning for Video Large Language Models0
Sparse-to-Dense: A Free Lunch for Lossless Acceleration of Video Understanding in LLMs0
Deep Video Discovery: Agentic Search with Tool Use for Long-form Video Understanding0
SoccerChat: Integrating Multimodal Data for Enhanced Soccer Game UnderstandingCode0
Show:102550
← PrevPage 45 of 115Next →

No leaderboard results yet.