SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 711720 of 1149 papers

TitleStatusHype
CaReBench: A Fine-Grained Benchmark for Video Captioning and Retrieval0
First Place Solution to the Multiple-choice Video QA Track of The Second Perception Test Challenge0
Flatten: Video Action Recognition is an Image Classification task0
Flexible Frame Selection for Efficient Video Reasoning0
FlexSelect: Flexible Token Selection for Efficient Long Video Understanding0
FocusChat: Text-guided Long Video Understanding via Spatiotemporal Information Filtering0
Foundations and Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions0
Four Eyes Are Better Than Two: Harnessing the Collaborative Potential of Large Models via Differentiated Thinking and Complementary Ensembles0
Frame-Voyager: Learning to Query Frames for Video Large Language Models0
From 128K to 4M: Efficient Training of Ultra-Long Context Large Language Models0
Show:102550
← PrevPage 72 of 115Next →

No leaderboard results yet.