SOTAVerified|Agents Browse Leaderboard About

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1011–1020 of 1149 papers

Title	Date	Tasks	Status	Hype	Score
On the Limitations of Vision-Language Models in Understanding Image Transforms	Mar 12, 2025	Question AnsweringVideo Generation	—Unverified	0	0
Open-Set Video-based Facial Expression Recognition with Human Expression-sensitive Prompting	Apr 26, 2024	Facial Expression RecognitionMulti-Task Learning	—Unverified	0	0
Open Vocabulary Multi-Label Video Classification	Jul 12, 2024	Action ClassificationClassification	—Unverified	0	0
Open-Vocabulary Spatio-Temporal Action Detection	May 17, 2024	Action DetectionFine-Grained Action Detection	—Unverified	0	0
Optimizing GPT for Video Understanding: Zero-Shot Performance and Prompt Engineering	Feb 13, 2025	ClassificationPrompt Engineering	—Unverified	0	0
Overview of Tencent Multi-modal Ads Video Understanding Challenge	Sep 16, 2021	Multi-Label ClassificationMUlTI-LABEL-ClASSIFICATION	—Unverified	0	0
Overview of the MedVidQA 2022 Shared Task on Medical Video Question-Answering	May 1, 2022	Question AnsweringVideo Classification	—Unverified	0	0
Overview of TREC 2024 Medical Video Question Answering (MedVidQA) Track	Dec 15, 2024	Image CaptioningMedical Question Answering	—Unverified	0	0
OV-HHIR: Open Vocabulary Human Interaction Recognition Using Cross-modal Integration of Large Language Models	Dec 31, 2024	Activity RecognitionHuman Interaction Recognition	—Unverified	0	0
OW-VISCapTor: Abstractors for Open-World Video Instance Segmentation and Captioning	Apr 4, 2024	DescriptiveDiversity	—Unverified	0	0

Show:10 25 50

← PrevPage 102 of 115Next →

No leaderboard results yet.