SOTAVerified|Agents Browse Leaderboard About

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 701–725 of 1149 papers

Title	Date	Tasks	Status
Extending Video Masked Autoencoders to 128 frames	Nov 20, 2024	DecoderVideo Understanding	—Unverified
Extensible Hierarchical Method of Detecting Interactive Actions for Video Understanding	Aug 11, 2017	Action DetectionAction Recognition	—Unverified
Real-Time Segmentation Networks should be Latency Aware	Apr 6, 2020	Autonomous VehiclesScene Segmentation	—Unverified
Fast Retinomorphic Event Stream for Video Recognition and Reinforcement Learning	May 16, 2018	Action RecognitionAtari Games	—Unverified
FaVChat: Unlocking Fine-Grained Facail Video Understanding with Multimodal Large Language Models	Mar 12, 2025	Mixture-of-ExpertsQuestion Answering	—Unverified
FAVOR-Bench: A Comprehensive Benchmark for Fine-Grained Video Motion Understanding	Mar 19, 2025	BenchmarkingMultiple-choice	—Unverified
Fewer Tokens and Fewer Videos: Extending Video Understanding Abilities in Large Vision-Language Models	Jun 12, 2024	Video Understanding	—Unverified
Fill-in-the-Blank: A Challenging Video Understanding Evaluation Framework	Nov 16, 2021	Multiple-choiceQuestion Answering	—Unverified
Fine-Grain Annotation of Cricket Videos	Nov 24, 2015	Action RecognitionRetrieval	—Unverified
Fine-Grained Video Captioning through Scene Graph Consolidation	Feb 23, 2025	Caption GenerationImage Captioning	—Unverified
CaReBench: A Fine-Grained Benchmark for Video Captioning and Retrieval	Dec 31, 2024	RetrievalText Retrieval	—Unverified
First Place Solution to the Multiple-choice Video QA Track of The Second Perception Test Challenge	Sep 20, 2024	Multiple-choiceQuestion Answering	—Unverified
Flatten: Video Action Recognition is an Image Classification task	Aug 17, 2024	Action Recognitionimage-classification	—Unverified
Flexible Frame Selection for Efficient Video Reasoning	Jan 1, 2025	Language ModelingLanguage Modelling	—Unverified
FlexSelect: Flexible Token Selection for Efficient Long Video Understanding	Jun 1, 2025	Video Understanding	—Unverified
FocusChat: Text-guided Long Video Understanding via Spatiotemporal Information Filtering	Dec 17, 2024	Language ModelingLanguage Modelling	—Unverified
Foundations and Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions	Sep 7, 2022	Image GenerationText to Image Generation	—Unverified
Four Eyes Are Better Than Two: Harnessing the Collaborative Potential of Large Models via Differentiated Thinking and Complementary Ensembles	May 22, 2025	EgoSchemaFew-Shot Learning	—Unverified
Frame-Voyager: Learning to Query Frames for Video Large Language Models	Oct 4, 2024	Question AnsweringVideo Question Answering	—Unverified
From 128K to 4M: Efficient Training of Ultra-Long Context Large Language Models	Apr 8, 2025	In-Context LearningInstruction Following	—Unverified
From Broadcast to Minimap: Achieving State-of-the-Art SoccerNet Game State Reconstruction	Apr 8, 2025	Game State ReconstructionJersey Number Recognition	—Unverified
From Image to Video, what do we need in multimodal LLMs?	Apr 18, 2024	Video Understanding	—Unverified
From Shots to Stories: LLM-Assisted Video Editing with Unified Language Representations	May 18, 2025	Video EditingVideo Understanding	—Unverified
From Trial to Triumph: Advancing Long Video Understanding via Visual Context Sample Scaling and Self-reward Alignment	Mar 26, 2025	Video Understanding	—Unverified
Fully Automated Hand Hygiene Monitoring\ Operating Room using 3D Convolutional Neural Network	Mar 20, 2020	Optical Flow EstimationTransfer Learning	—Unverified

Show:10 25 50

← PrevPage 29 of 46Next →

No leaderboard results yet.