SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 161170 of 1149 papers

TitleStatusHype
4D-Bench: Benchmarking Multi-modal Large Language Models for 4D Object UnderstandingCode0
Collaborative Temporal Consistency Learning for Point-supervised Natural Language Video Localization0
Temporal Action Detection Model Compression by Progressive Block Drop0
PVChat: Personalized Video Chat with One-Shot Learning0
Hybrid-Level Instruction Injection for Video Token Compression in Multi-modal Large Language ModelsCode1
STOP: Integrated Spatial-Temporal Dynamic Prompting for Video UnderstandingCode1
Agentic Keyframe Search for Video Question AnsweringCode1
DocVideoQA: Towards Comprehensive Understanding of Document-Centric Videos through Question Answering0
MASH-VLM: Mitigating Action-Scene Hallucination in Video-LLMs through Disentangled Spatial-Temporal Representations0
What can Off-the-Shelves Large Multi-Modal Models do for Dynamic Scene Graph Generation?0
Show:102550
← PrevPage 17 of 115Next →

No leaderboard results yet.