SOTAVerified|Agents Browse Leaderboard About

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 801–825 of 1149 papers

Title	Date	Tasks	Status
Egocentric and Exocentric Methods: A Short Survey	Oct 27, 2024	Action RecognitionSurvey	—Unverified
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling	Dec 6, 2024	document understandingHallucination	—Unverified
Exploiting Spatial-Temporal Modelling and Multi-Modal Fusion for Human Action Recognition	Jun 27, 2018	Action RecognitionTemporal Action Localization	—Unverified
Exploring Anchor-based Detection for Ego4D Natural Language Query	Aug 10, 2022	Video Understanding	—Unverified
Exploring Missing Modality in Multimodal Egocentric Datasets	Jan 21, 2024	Action RecognitionVideo Understanding	—Unverified
Exploring State Change Capture of Heterogeneous Backbones @ Ego4D Hands and Objects Challenge 2022	Nov 16, 2022	Human-Object Interaction DetectionObject	—Unverified
Exploring the Role of Explicit Temporal Modeling in Multimodal Large Language Models for Video Understanding	Jan 28, 2025	DecoderVideo Understanding	—Unverified
Extending Video Masked Autoencoders to 128 frames	Nov 20, 2024	DecoderVideo Understanding	—Unverified
Extensible Hierarchical Method of Detecting Interactive Actions for Video Understanding	Aug 11, 2017	Action DetectionAction Recognition	—Unverified
Real-Time Segmentation Networks should be Latency Aware	Apr 6, 2020	Autonomous VehiclesScene Segmentation	—Unverified
Fast Retinomorphic Event Stream for Video Recognition and Reinforcement Learning	May 16, 2018	Action RecognitionAtari Games	—Unverified
FaVChat: Unlocking Fine-Grained Facail Video Understanding with Multimodal Large Language Models	Mar 12, 2025	Mixture-of-ExpertsQuestion Answering	—Unverified
FAVOR-Bench: A Comprehensive Benchmark for Fine-Grained Video Motion Understanding	Mar 19, 2025	BenchmarkingMultiple-choice	—Unverified
Fewer Tokens and Fewer Videos: Extending Video Understanding Abilities in Large Vision-Language Models	Jun 12, 2024	Video Understanding	—Unverified
Fill-in-the-Blank: A Challenging Video Understanding Evaluation Framework	Nov 16, 2021	Multiple-choiceQuestion Answering	—Unverified
Fine-Grain Annotation of Cricket Videos	Nov 24, 2015	Action RecognitionRetrieval	—Unverified
Fine-Grained Video Captioning through Scene Graph Consolidation	Feb 23, 2025	Caption GenerationImage Captioning	—Unverified
CaReBench: A Fine-Grained Benchmark for Video Captioning and Retrieval	Dec 31, 2024	RetrievalText Retrieval	—Unverified
First Place Solution to the Multiple-choice Video QA Track of The Second Perception Test Challenge	Sep 20, 2024	Multiple-choiceQuestion Answering	—Unverified
Flatten: Video Action Recognition is an Image Classification task	Aug 17, 2024	Action Recognitionimage-classification	—Unverified
Flexible Frame Selection for Efficient Video Reasoning	Jan 1, 2025	Language ModelingLanguage Modelling	—Unverified
FlexSelect: Flexible Token Selection for Efficient Long Video Understanding	Jun 1, 2025	Video Understanding	—Unverified
FocusChat: Text-guided Long Video Understanding via Spatiotemporal Information Filtering	Dec 17, 2024	Language ModelingLanguage Modelling	—Unverified
Foundations and Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions	Sep 7, 2022	Image GenerationText to Image Generation	—Unverified
Four Eyes Are Better Than Two: Harnessing the Collaborative Potential of Large Models via Differentiated Thinking and Complementary Ensembles	May 22, 2025	EgoSchemaFew-Shot Learning	—Unverified

Show:10 25 50

← PrevPage 33 of 46Next →

No leaderboard results yet.