SOTAVerified|Agents Browse Leaderboard About

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 91–100 of 1149 papers

Title	Date	Tasks	Status	Hype
InstructSeg: Unifying Instructed Visual Segmentation with Multi-modal Large Language Models	Dec 18, 2024	Reasoning SegmentationSegmentation	CodeCode Available	2
Uni-AdaFocus: Spatial-temporal Dynamic Computation for Video Recognition	Dec 15, 2024	Computational EfficiencyVideo Recognition	CodeCode Available	2
Neptune: The Long Orbit to Benchmarking Long Video Understanding	Dec 12, 2024	BenchmarkingMultimodal Reasoning	CodeCode Available	2
LinVT: Empower Your Image-level Large Language Model to Understand Videos	Dec 6, 2024	Language ModelingLanguage Modelling	CodeCode Available	2
AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning	Dec 4, 2024	Video Understanding	CodeCode Available	2
LongVALE: Vision-Audio-Language-Event Benchmark Towards Time-Aware Omni-Modal Perception of Long Videos	Nov 29, 2024	Boundary DetectionDense Video Captioning	CodeCode Available	2
TimeMarker: A Versatile Video-LLM for Long and Short Video Understanding with Superior Temporal Localization Ability	Nov 27, 2024	Temporal LocalizationVideo Understanding	CodeCode Available	2
StreamingBench: Assessing the Gap for MLLMs to Achieve Streaming Video Understanding	Nov 6, 2024	Image ComprehensionStreaming video understanding	CodeCode Available	2
PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance	Nov 4, 2024	Caption GenerationMultiple-choice	CodeCode Available	2
TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning	Oct 25, 2024	EgoSchemaHallucination	CodeCode Available	2

Show:10 25 50

← PrevPage 10 of 115Next →

No leaderboard results yet.