SOTAVerified|Agents Browse Leaderboard About Blog

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 21–30 of 1149 papers

Title	Date	Tasks	Status	Hype
Goldfish: Vision-Language Understanding of Arbitrarily Long Videos	Jul 17, 2024	RetrievalVideo Understanding	CodeCode Available	4
Tarsier: Recipes for Training and Evaluating Large Video Description Models	Jun 30, 2024	Video CaptioningVideo Description	CodeCode Available	4
PVUW 2024 Challenge on Complex Video Understanding: Methods and Results	Jun 24, 2024	SegmentationSemantic Segmentation	CodeCode Available	4
MovieChat+: Question-aware Sparse Memory for Long Video Question Answering	Apr 26, 2024	2kQuestion Answering	CodeCode Available	4
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning	Apr 25, 2024	Dense CaptioningMVBench	CodeCode Available	4
MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with Interleaved Visual-Textual Tokens	Apr 4, 2024	Language ModelingLanguage Modelling	CodeCode Available	4
SnAG: Scalable and Accurate Video Grounding	Apr 2, 2024	Video GroundingVideo Understanding	CodeCode Available	4
An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models	Mar 11, 2024	Computational EfficiencyVideo Understanding	CodeCode Available	4
Video Understanding with Large Language Models: A Survey	Dec 29, 2023	SurveyVideo Understanding	CodeCode Available	4
A Survey on Video Diffusion Models	Oct 16, 2023	Image GenerationSurvey	CodeCode Available	4

Show:10 25 50

← PrevPage 3 of 115Next →

No leaderboard results yet.