SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 601610 of 1149 papers

TitleStatusHype
What can Off-the-Shelves Large Multi-Modal Models do for Dynamic Scene Graph Generation?0
What Makes a Video a Video: Analyzing Temporal Information in Video Understanding Models and Datasets0
When Work Matters: Transforming Classical Network Structures to Graph CNN0
WildQA: In-the-Wild Video Question Answering0
Wolf: Captioning Everything with a World Summarization Framework0
WorldQA: Multimodal World Knowledge in Videos through Long-Chain Reasoning0
WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs0
X-LeBench: A Benchmark for Extremely Long Egocentric Video Understanding0
YouMVOS: An Actor-Centric Multi-Shot Video Object Segmentation Dataset0
YouTube-8M Video Understanding Challenge Approach and Applications0
Show:102550
← PrevPage 61 of 115Next →

No leaderboard results yet.