SOTAVerified|Agents Browse Leaderboard About

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1026–1050 of 1149 papers

Title	Date	Tasks	Status
Judging a video by its bitstream cover	Sep 14, 2023	Video Understanding	CodeCode Available
CARPe Posterum: A Convolutional Approach for Real-time Pedestrian Path Prediction	May 26, 2020	Autonomous VehiclesPrediction	CodeCode Available
VideoEval: Comprehensive Benchmark Suite for Low-Cost Evaluation of Video Foundation Model	Jul 9, 2024	Video Understanding	CodeCode Available
Joint Event Detection and Description in Continuous Video Streams	Feb 28, 2018	Dense CaptioningDense Video Captioning	CodeCode Available
Watch, Listen, and Describe: Globally and Locally Aligned Cross-Modal Attentions for Video Captioning	Apr 15, 2018	Video CaptioningVideo Understanding	CodeCode Available
Capturing Temporal Information in a Single Frame: Channel Sampling Strategies for Action Recognition	Jan 25, 2022	Action RecognitionOptical Flow Estimation	CodeCode Available
B-VLLM: A Vision Large Language Model with Balanced Spatio-Temporal Tokens	Dec 13, 2024	Language ModelingLanguage Modelling	CodeCode Available
In My Perspective, In My Hands: Accurate Egocentric 2D Hand Pose and Action Recognition	Apr 14, 2024	Action RecognitionHand Pose Estimation	CodeCode Available
ViP: Video Platform for PyTorch	Oct 7, 2019	BenchmarkingVideo Understanding	CodeCode Available
ViQAgent: Zero-Shot Video Question Answering via Agent with Open-Vocabulary Grounding Validation	May 21, 2025	Decision MakingLanguage Modeling	CodeCode Available
Bridging Perspectives: A Survey on Cross-view Collaborative Intelligence with Egocentric-Exocentric Vision	Jun 6, 2025	Video Understanding	CodeCode Available
ICSVR: Investigating Compositional and Syntactic Understanding in Video Retrieval Models	Jun 28, 2023	RetrievalVideo Retrieval	CodeCode Available
https://arxiv.org/abs/2407.00634	Jul 2, 2024	Video CaptioningVideo Description	CodeCode Available
How Would The Viewer Feel? Estimating Wellbeing From Video Scenarios	Oct 18, 2022	Video Understanding	CodeCode Available
HopaDIFF: Holistic-Partial Aware Fourier Conditioned Diffusion for Referring Human Action Segmentation in Multi-Person Scenarios	Jun 11, 2025	Action RecognitionAction Segmentation	CodeCode Available
Bootstrap Your Own Views: Masked Ego-Exo Modeling for Fine-grained View-invariant Video Representations	Mar 25, 2025	Representation LearningVideo Understanding	CodeCode Available
HLV-1K: A Large-scale Hour-Long Video Benchmark for Time-Specific Long Video Understanding	Jan 3, 2025	Question AnsweringVideo Understanding	CodeCode Available
The Visual Centrifuge: Model-Free Layered Video Representations	Dec 4, 2018	Color Constancymodel	CodeCode Available
The YouTube-8M Kaggle Competition: Challenges and Methods	Jun 28, 2017	General ClassificationVideo Classification	CodeCode Available
Beyond Raw Videos: Understanding Edited Videos with Large Multimodal Model	Jun 15, 2024	Question AnsweringVideo Understanding	CodeCode Available
The Monkeytyping Solution to the YouTube-8M Video Understanding Challenge	Jun 16, 2017	General ClassificationVideo Classification	CodeCode Available
Hierarchical Deep Recurrent Architecture for Video Understanding	Jul 11, 2017	ClassificationGeneral Classification	CodeCode Available
Temporal Tessellation: A Unified Approach for Video Analysis	Dec 21, 2016	Action DetectionVideo Captioning	CodeCode Available
Temporal-Oriented Recipe for Transferring Large Vision-Language Model to Video Understanding	May 19, 2025	Language ModelingLanguage Modelling	CodeCode Available
Temporal Modeling Approaches for Large-scale Youtube-8M Video Understanding	Jul 14, 2017	Video RecognitionVideo Understanding	CodeCode Available

Show:10 25 50

← PrevPage 42 of 46Next →

No leaderboard results yet.