SOTAVerified|Agents Browse Leaderboard About

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 901–925 of 1149 papers

Title	Date	Tasks	Status
OmniVDiff: Omni Controllable Video Diffusion for Generation and Understanding	Apr 15, 2025	Semantic SegmentationVideo Generation	—Unverified
OmniV-Med: Scaling Medical Vision-Language Model for Universal Visual Understanding	Apr 20, 2025	Language ModelingLanguage Modelling	—Unverified
Only Time Can Tell: Discovering Temporal Data for Temporal Modeling	Jul 19, 2019	BenchmarkingMotion Estimation	—Unverified
On the Limitations of Vision-Language Models in Understanding Image Transforms	Mar 12, 2025	Question AnsweringVideo Generation	—Unverified
Open-Set Video-based Facial Expression Recognition with Human Expression-sensitive Prompting	Apr 26, 2024	Facial Expression RecognitionMulti-Task Learning	—Unverified
Open Vocabulary Multi-Label Video Classification	Jul 12, 2024	Action ClassificationClassification	—Unverified
Open-Vocabulary Spatio-Temporal Action Detection	May 17, 2024	Action DetectionFine-Grained Action Detection	—Unverified
Optimizing GPT for Video Understanding: Zero-Shot Performance and Prompt Engineering	Feb 13, 2025	ClassificationPrompt Engineering	—Unverified
Overview of Tencent Multi-modal Ads Video Understanding Challenge	Sep 16, 2021	Multi-Label ClassificationMUlTI-LABEL-ClASSIFICATION	—Unverified
Overview of the MedVidQA 2022 Shared Task on Medical Video Question-Answering	May 1, 2022	Question AnsweringVideo Classification	—Unverified
Threading Keyframe with Narratives: MLLMs as Strong Long Video Comprehenders	May 30, 2025	Video Understanding	—Unverified
Time Blindness: Why Video-Language Models Can't See What Humans Can?	May 30, 2025	Temporal SequencesVideo Understanding	—Unverified
TimeSearch: Hierarchical Video Search with Spotlight and Reflection for Human-like Long Video Understanding	Apr 2, 2025	Video Understanding	—Unverified
TimeSoccer: An End-to-End Multimodal Large Language Model for Soccer Commentary Generation	Apr 24, 2025	Caption GenerationDense Video Captioning	—Unverified
TIME: Temporal-sensitive Multi-dimensional Instruction Tuning and Benchmarking for Video-LLMs	Mar 13, 2025	BenchmarkingQuestion Answering	—Unverified
Toward a Human-Level Video Understanding Intelligence	Oct 8, 2021	AI AgentVideo Understanding	—Unverified
Towards Child-Inclusive Clinical Video Understanding for Autism Spectrum Disorder	Sep 20, 2024	Activity RecognitionDiagnostic	—Unverified
Towards Efficient and Robust Moment Retrieval System: A Unified Framework for Multi-Granularity Models and Temporal Reranking	Apr 11, 2025	Moment RetrievalQuestion Answering	—Unverified
Towards Fine-Grained Video Question Answering	Mar 10, 2025	Language ModelingLanguage Modelling	—Unverified
Towards Holistic Language-video Representation: the language model-enhanced MSR-Video to Text Dataset	Jun 19, 2024	Language ModelingLanguage Modelling	—Unverified
Towards Long Video Understanding via Fine-detailed Video Story Generation	Dec 9, 2024	Story GenerationVideo Understanding	—Unverified
Towards Scalable Modeling of Compressed Videos for Efficient Action Recognition	Mar 17, 2025	Action RecognitionVideo Recognition	—Unverified
Towards Training Stronger Video Vision Transformers for EPIC-KITCHENS-100 Action Recognition	Jun 9, 2021	Action RecognitionPoint Cloud Classification	—Unverified
Towards Visual Discrimination and Reasoning of Real-World Physical Dynamics: Physics-Grounded Anomaly Detection	Mar 5, 2025	Anomaly DetectionObject	—Unverified
Transformed ROIs for Capturing Visual Transformations in Videos	Jun 6, 2021	Action RecognitionVideo Understanding	—Unverified

Show:10 25 50

← PrevPage 37 of 46Next →

No leaderboard results yet.