SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 451475 of 1149 papers

TitleStatusHype
LiVLR: A Lightweight Visual-Linguistic Reasoning Framework for Video Question Answering0
Interpretable Action Recognition on Hard to Classify Actions0
Localizing Events in Videos with Multimodal Queries0
In-the-Wild Video Question Answering0
Long Activity Video Understanding using Functional Object-Oriented Network0
IPAD: Industrial Process Anomaly Detection Dataset0
InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation0
IQViC: In-context, Question Adaptive Vision Compressor for Long-term Video Understanding LMMs0
InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model0
DMC-Net: Generating Discriminative Motion Cues for Fast Compressed Video Action Recognition0
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output0
Is Temporal Prompting All We Need For Limited Labeled Action Recognition?0
DLM-VMTL:A Double Layer Mapper for heterogeneous data video Multi-task prompt learning0
Integrated Object Detection and Tracking with Tracklet-Conditioned Detection0
Dr2Net: Dynamic Reversible Dual-Residual Networks for Memory-Efficient Finetuning0
Jointly Learning Energy Expenditures and Activities Using Egocentric Multimodal Signals0
Beyond Boxes: Mask-Guided Spatio-Temporal Feature Aggregation for Video Object Detection0
Kangaroo: A Powerful Video-Language Model Supporting Long-context Video Input0
Instrument-tissue Interaction Detection Framework for Surgical Video Understanding0
KeyVideoLLM: Towards Large-scale Video Keyframe Selection0
InstructionBench: An Instructional Video Understanding Benchmark0
KnowIT VQA: Answering Knowledge-Based Questions about Videos0
Knowledge-Based Visual Question Answering in Videos0
Koala: Key frame-conditioned long video-LLM0
Distantly Supervised Semantic Text Detection and Recognition for Broadcast Sports Videos Understanding0
Show:102550
← PrevPage 19 of 46Next →

No leaderboard results yet.