SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 821830 of 1149 papers

TitleStatusHype
Flexible Frame Selection for Efficient Video Reasoning0
FlexSelect: Flexible Token Selection for Efficient Long Video Understanding0
FocusChat: Text-guided Long Video Understanding via Spatiotemporal Information Filtering0
Foundations and Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions0
Four Eyes Are Better Than Two: Harnessing the Collaborative Potential of Large Models via Differentiated Thinking and Complementary Ensembles0
Frame-Voyager: Learning to Query Frames for Video Large Language Models0
From 128K to 4M: Efficient Training of Ultra-Long Context Large Language Models0
From Broadcast to Minimap: Achieving State-of-the-Art SoccerNet Game State Reconstruction0
From Image to Video, what do we need in multimodal LLMs?0
From Shots to Stories: LLM-Assisted Video Editing with Unified Language Representations0
Show:102550
← PrevPage 83 of 115Next →

No leaderboard results yet.