SOTAVerified|Agents Browse Leaderboard About

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 461–470 of 1149 papers

Title	Date	Tasks	Status	Hype
InfiniBench: A Comprehensive Benchmark for Large Multimodal Models in Very Long Video Understanding	Jun 28, 2024	Multiple-choiceVideo Understanding	CodeCode Available	1
Snakes and Ladders: Two Steps Up for VideoMamba	Jun 27, 2024	Action RecognitionMamba	CodeCode Available	1
Fibottention: Inceptive Visual Representation Learning with Diverse Attention Across Heads	Jun 27, 2024	Diversityimage-classification	CodeCode Available	1
OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding	Jun 27, 2024	DecoderSegmentation	CodeCode Available	5
Zero-Shot Long-Form Video Understanding through Screenplay	Jun 25, 2024	FormQuestion Answering	—Unverified	0
PVUW 2024 Challenge on Complex Video Understanding: Methods and Results	Jun 24, 2024	SegmentationSemantic Segmentation	CodeCode Available	4
VideoHallucer: Evaluating Intrinsic and Extrinsic Hallucinations in Large Video-Language Models	Jun 24, 2024	HallucinationVideo Understanding	—Unverified	0
OmAgent: A Multi-modal Agent Framework for Complex Video Understanding with Task Divide-and-Conquer	Jun 24, 2024	AI AgentLarge Language Model	CodeCode Available	2
video-SALMONN: Speech-Enhanced Audio-Visual Large Language Models	Jun 22, 2024	DiversityLanguage Modeling	CodeCode Available	0
Towards Event-oriented Long Video Understanding	Jun 20, 2024	Video Understanding	CodeCode Available	1

Show:10 25 50

← PrevPage 47 of 115Next →

No leaderboard results yet.