SOTAVerified|Agents Browse Leaderboard About

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1076–1100 of 1149 papers

Title	Date	Tasks	Status
Self-Supervised Spatiotemporal Feature Learning via Video Rotation Prediction	Nov 28, 2018	Action RecognitionPrediction	—Unverified
Self-supervised video pretraining yields robust and more human-aligned visual representations	Oct 12, 2022	Contrastive Learningobject-detection	—Unverified
Semantics-aware Test-time Adaptation for 3D Human Pose Estimation	Feb 15, 2025	3D human pose and shape estimation3D Human Pose Estimation	—Unverified
Semantic Segmentation on VSPW Dataset through Masked Video Consistency	Jun 7, 2024	Semantic SegmentationVideo Understanding	—Unverified
Semi-Parametric Video-Grounded Text Generation	Jan 27, 2023	Language ModelingLanguage Modelling	—Unverified
SF2T: Self-supervised Fragment Finetuning of Video-LLMs for Fine-Grained Understanding	Apr 10, 2025	Video Understanding	—Unverified
ShotVL: Human-Centric Highlight Frame Retrieval via Language Queries	Dec 17, 2024	Human Detectionimage-classification	—Unverified
SkillFormer: Unified Multi-View Video Understanding for Proficiency Estimation	May 13, 2025	Computational EfficiencyVideo Understanding	—Unverified
Skimming and Scanning for Untrimmed Video Action Recognition	Apr 21, 2021	Action RecognitionTemporal Action Localization	—Unverified
Slicing Convolutional Neural Network for Crowd Video Understanding	Jun 1, 2016	AttributeVideo Understanding	—Unverified
Slot-VLM: SlowFast Slots for Video-Language Modeling	Feb 20, 2024	Language ModelingLanguage Modelling	—Unverified
SlowFast-LLaVA-1.5: A Family of Token-Efficient Video Large Language Models for Long-Form Video Understanding	Mar 24, 2025	FormVideo Understanding	—Unverified
SpaceVLLM: Endowing Multimodal Large Language Model with Spatio-Temporal Video Grounding Capability	Mar 18, 2025	Language ModelingLanguage Modelling	—Unverified
Spacewalk-18: A Benchmark for Multimodal and Long-form Procedural Video Understanding	Nov 30, 2023	FormVideo Retrieval	—Unverified
Sparse-to-Dense: A Free Lunch for Lossless Acceleration of Video Understanding in LLMs	May 25, 2025	Video Understanding	—Unverified
Spatio-Temporal Context for Action Detection	Jun 29, 2021	Action DetectionVideo Understanding	—Unverified
Spatio-Temporal Crop Aggregation for Video Representation Learning	Nov 30, 2022	Action ClassificationDimensionality Reduction	—Unverified
Spatio-Temporal Vector of Locally Max Pooled Features for Action Recognition in Videos	Jul 1, 2017	Action RecognitionAction Recognition In Videos	—Unverified
Spatio-Temporal Video Representation Learning for AI Based Video Playback Style Prediction	Oct 3, 2021	Action RecognitionRepresentation Learning	—Unverified
Speeding Up Action Recognition Using Dynamic Accumulation of Residuals in Compressed Domain	Sep 29, 2022	Action RecognitionVideo Understanding	—Unverified
Spherical World-Locking for Audio-Visual Localization in Egocentric Videos	Aug 9, 2024	Active Speaker LocalizationDecoder	—Unverified
Spoken Moments: Learning Joint Audio-Visual Representations from Video Descriptions	May 10, 2021	Contrastive LearningRetrieval	—Unverified
SPOT! Revisiting Video-Language Models for Event Understanding	Nov 21, 2023	AttributeVideo Understanding	—Unverified
Stacked Temporal Attention: Improving First-person Action Recognition by Emphasizing Discriminative Clips	Dec 2, 2021	Action RecognitionVideo Understanding	—Unverified
STEP: Enhancing Video-LLMs' Compositional Reasoning by Spatio-Temporal Graph-guided Self-Training	Nov 29, 2024	Question AnsweringVideo Understanding	—Unverified

Show:10 25 50

← PrevPage 44 of 46Next →

No leaderboard results yet.