SOTAVerified|Agents Browse Leaderboard About

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 711–720 of 1149 papers

Title	Date	Tasks	Status	Hype	Score
Causal Reasoning Meets Visual Representation Learning: A Prospective Study	Apr 26, 2022	BenchmarkingOut-of-Distribution Generalization	—Unverified	0	0
CAVALRY-V: A Large-Scale Generator Framework for Adversarial Attacks on Video MLLMs	Jul 1, 2025	Text GenerationVideo Understanding	—Unverified	0	0
CG-Bench: Clue-grounded Question Answering Benchmark for Long Video Understanding	Dec 16, 2024	HallucinationMultiple-choice	—Unverified	0	0
Challenges in Deploying Long-Context Transformers: A Theoretical Peak Performance Analysis	May 14, 2024	4kGPU	—Unverified	0	0
Charades-Ego: A Large-Scale Dataset of Paired Third and First Person Videos	Apr 25, 2018	General ClassificationVideo Classification	—Unverified	0	0
ChatVideo: A Tracklet-centric Multimodal and Versatile Video Understanding System	Apr 27, 2023	Video Understanding	—Unverified	0	0
Chat with AI: The Surprising Turn of Real-time Video Communication from Human to AI	Jul 14, 2025	Large Language ModelMultimodal Large Language Model	—Unverified	0	0
CinePile: A Long Video Question Answering Dataset and Benchmark	May 14, 2024	FormHuman-Object Interaction Detection	—Unverified	0	0
Clapper: Compact Learning and Video Representation in VLMs	May 21, 2025	Video Understanding	—Unverified	0	0
ClawCraneNet: Leveraging Object-level Relation for Text-based Video Segmentation	Mar 19, 2021	ObjectReferring Expression Segmentation	—Unverified	0	0

Show:10 25 50

← PrevPage 72 of 115Next →

No leaderboard results yet.