SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 10811090 of 1149 papers

TitleStatusHype
SF2T: Self-supervised Fragment Finetuning of Video-LLMs for Fine-Grained Understanding0
ShotVL: Human-Centric Highlight Frame Retrieval via Language Queries0
SkillFormer: Unified Multi-View Video Understanding for Proficiency Estimation0
Skimming and Scanning for Untrimmed Video Action Recognition0
Slicing Convolutional Neural Network for Crowd Video Understanding0
Slot-VLM: SlowFast Slots for Video-Language Modeling0
SlowFast-LLaVA-1.5: A Family of Token-Efficient Video Large Language Models for Long-Form Video Understanding0
SpaceVLLM: Endowing Multimodal Large Language Model with Spatio-Temporal Video Grounding Capability0
Spacewalk-18: A Benchmark for Multimodal and Long-form Procedural Video Understanding0
Sparse-to-Dense: A Free Lunch for Lossless Acceleration of Video Understanding in LLMs0
Show:102550
← PrevPage 109 of 115Next →

No leaderboard results yet.