SOTAVerified|Agents Browse Leaderboard About

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 281–290 of 1149 papers

Title	Date	Tasks	Status	Hype
Language-Guided Audio-Visual Learning for Long-Term Sports Assessment	Jan 1, 2025	audio-visual learningKnowledge Graphs	CodeCode Available	1
VEU-Bench: Towards Comprehensive Understanding of Video Editing	Jan 1, 2025	Video EditingVideo Understanding	—Unverified	0
Weakly Supervised Temporal Action Localization via Dual-Prior Collaborative Learning Guided by Multimodal Large Language Models	Jan 1, 2025	Action LocalizationTemporal Action Localization	—Unverified	0
OV-HHIR: Open Vocabulary Human Interaction Recognition Using Cross-modal Integration of Large Language Models	Dec 31, 2024	Activity RecognitionHuman Interaction Recognition	—Unverified	0
CaReBench: A Fine-Grained Benchmark for Video Captioning and Retrieval	Dec 31, 2024	RetrievalText Retrieval	—Unverified	0
Embodied VideoAgent: Persistent Memory from Egocentric Videos and Embodied Sensors Enables Dynamic Scene Understanding	Dec 31, 2024	Robot ManipulationScene Understanding	—Unverified	0
VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM	Dec 31, 2024	ObjectVideo Understanding	CodeCode Available	3
Online Video Understanding: OVBench and VideoChat-Online	Dec 31, 2024	Autonomous DrivingQuestion Answering	CodeCode Available	2
FrameFusion: Combining Similarity and Importance for Video Token Reduction on Large Visual Language Models	Dec 30, 2024	Question AnsweringToken Reduction	CodeCode Available	2
Detection-Fusion for Knowledge Graph Extraction from Videos	Dec 30, 2024	Knowledge GraphsLanguage Modeling	CodeCode Available	0

Show:10 25 50

← PrevPage 29 of 115Next →

No leaderboard results yet.