SOTAVerified|Agents Browse Leaderboard About

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1051–1075 of 1149 papers

Title	Date	Tasks	Status
Recurring the Transformer for Video Action Recognition	Jan 1, 2022	Action RecognitionGPU	—Unverified
Relational Space-Time Query in Long-Form Videos	Jan 1, 2023	FormVideo Understanding	—Unverified
Residual Frames with Efficient Pseudo-3D CNN for Human Action Recognition	Aug 3, 2020	Action RecognitionOptical Flow Estimation	—Unverified
ResNetVLLM -- Multi-modal Vision LLM for the Video Understanding Task	Apr 20, 2025	Language ModelingLanguage Modelling	—Unverified
Rethinking Image-to-Video Adaptation: An Object-centric Perspective	Jul 9, 2024	Action RecognitionObject	—Unverified
Rethinking Video-Text Understanding: Retrieval from Counterfactually Augmented Data	Jul 18, 2024	Language ModellingLarge Language Model	—Unverified
Retrieval-based Video Language Model for Efficient Long Video Question Answering	Dec 8, 2023	Language ModelingLanguage Modelling	—Unverified
RETTA: Retrieval-Enhanced Test-Time Adaptation for Zero-Shot Video Captioning	May 11, 2024	Image-text matchingRetrieval	—Unverified
Revealing Occlusions with 4D Neural Fields	Apr 22, 2022	Video Understanding	—Unverified
Revisiting Kernel Temporal Segmentation as an Adaptive Tokenizer for Long-form Video Understanding	Sep 20, 2023	Action LocalizationForm	—Unverified
ReWind: Understanding Long Videos with Instructed Learnable Memory	Nov 23, 2024	Large Language ModelQuestion Answering	—Unverified
SA-NET.v2: Real-time vehicle detection from oblique UAV images with use of uncertainty estimation in deep meta-learning	Aug 4, 2022	Meta-LearningSemantic Segmentation	—Unverified
SAVEn-Vid: Synergistic Audio-Visual Integration for Enhanced Understanding in Long Video Context	Nov 25, 2024	Large Language ModelMME	—Unverified
Scene-centric Joint Parsing of Cross-view Videos	Sep 16, 2017	Video Understanding	—Unverified
Scene Detection Policies and Keyframe Extraction Strategies for Large-Scale Video Analysis	May 31, 2025	Scene SegmentationSegmentation	—Unverified
SceneRAG: Scene-level Retrieval-Augmented Generation for Video Understanding	Jun 9, 2025	RAGRetrieval	—Unverified
MM-SEAL: A Large-scale Video Dataset of Multi-person Multi-grained Spatio-temporally Action Localization	Apr 6, 2022	Action LocalizationAction Recognition	—Unverified
SEAL: Semantic Attention Learning for Long Video Representation	Dec 2, 2024	DiversityQuestion Answering	—Unverified
Search-Map-Search: A Frame Selection Paradigm for Action Recognition	Apr 20, 2023	Action RecognitionHeuristic Search	—Unverified
Seed1.5-VL Technical Report	May 11, 2025	Mixture-of-ExpertsMultimodal Reasoning	—Unverified
Selective Structured State-Spaces for Long-Form Video Understanding	Mar 25, 2023	Contrastive LearningForm	—Unverified
Self-alignment of Large Video Language Models with Refined Regularized Preference Optimization	Apr 16, 2025	HallucinationQuestion Answering	—Unverified
Self-ReS: Self-Reflection in Large Vision-Language Models for Long Video Understanding	Mar 26, 2025	GPUQuestion Answering	—Unverified
Self-supervised Motion Representation via Scattering Local Motion Cues	Aug 1, 2020	Action RecognitionOptical Flow Estimation	—Unverified
Self-Supervised Object Detection from Egocentric Videos	Jan 1, 2023	Class-agnostic Object DetectionObject	—Unverified

Show:10 25 50

← PrevPage 43 of 46Next →

No leaderboard results yet.