SOTAVerified|Agents Browse Leaderboard About

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 331–340 of 1149 papers

Title	Date	Tasks	Status	Hype
Towards Universal Soccer Video Understanding	Dec 2, 2024	Action ClassificationSports Understanding	CodeCode Available	3
VideoSAVi: Self-Aligned Video Language Models without Human Supervision	Dec 1, 2024	EgoSchemaMVBench	—Unverified	0
VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by Video Spatiotemporal Augmentation	Dec 1, 2024	Instruction FollowingVideo Understanding	—Unverified	0
STEP: Enhancing Video-LLMs' Compositional Reasoning by Spatio-Temporal Graph-guided Self-Training	Nov 29, 2024	Question AnsweringVideo Understanding	—Unverified	0
LongVALE: Vision-Audio-Language-Event Benchmark Towards Time-Aware Omni-Modal Perception of Long Videos	Nov 29, 2024	Boundary DetectionDense Video Captioning	CodeCode Available	2
Perception Test 2024: Challenge Summary and a Novel Hour-Long VideoQA Benchmark	Nov 29, 2024	BenchmarkingGrounded Video Question Answering	—Unverified	0
T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs	Nov 29, 2024	Data AugmentationDiversity	CodeCode Available	1
Look Every Frame All at Once: Video-Ma^2mba for Efficient Long-form Video Understanding with Multi-Axis Gradient Checkpointing	Nov 29, 2024	AllForm	—Unverified	0
TimeMarker: A Versatile Video-LLM for Long and Short Video Understanding with Superior Temporal Localization Ability	Nov 27, 2024	Temporal LocalizationVideo Understanding	CodeCode Available	2
SAVEn-Vid: Synergistic Audio-Visual Integration for Enhanced Understanding in Long Video Context	Nov 25, 2024	Large Language ModelMME	—Unverified	0

Show:10 25 50

← PrevPage 34 of 115Next →

No leaderboard results yet.