SOTAVerified|Agents Browse Leaderboard About

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 651–675 of 1149 papers

Title	Date	Tasks	Status
A Multimodal Sentiment Dataset for Video Recommendation	Sep 17, 2021	Multimodal Sentiment AnalysisSentiment Analysis	—Unverified
Analyzing Zero-Shot Abilities of Vision-Language Models on Video Understanding Tasks	Oct 7, 2023	Action RecognitionMultiple-choice	—Unverified
An Attempt towards Interpretable Audio-Visual Video Captioning	Dec 7, 2018	Audio captioningAudio-Visual Video Captioning	—Unverified
An Effective Way to Improve YouTube-8M Classification Accuracy in Google Cloud Platform	Jun 26, 2017	ClassificationDeep Learning	—Unverified
An Efficient Spatio-Temporal Pyramid Transformer for Action Detection	Jul 21, 2022	Action DetectionVideo Understanding	—Unverified
An LMM for Efficient Video Understanding via Reinforced Compression of Video Cubes	Apr 21, 2025	MMEVideo MME	—Unverified
Anticipating Object State Changes in Long Procedural Videos	May 21, 2024	ObjectObject State Change Classification	—Unverified
Apollo: An Exploration of Video Understanding in Large Multimodal Models	Dec 13, 2024	MMEVideo MME	—Unverified
APVR: Hour-Level Long Video Understanding with Adaptive Pivot Visual Information Retrieval	Jun 5, 2025	Information RetrievalRetrieval	—Unverified
Artificial intelligence optical hardware empowers high-resolution hyperspectral video understanding at 1.2 Tb/s	Dec 17, 2023	Semantic SegmentationVideo Semantic Segmentation	—Unverified
A SPIKING SEQUENTIAL MODEL: RECURRENT LEAKY INTEGRATE-AND-FIRE	Sep 25, 2019	Text SummarizationVideo Understanding	—Unverified
A Structured Model For Action Detection	Dec 9, 2018	Action Detectionmodel	—Unverified
A Study On the Effects of Pre-processing On Spatio-temporal Action Recognition Using Spiking Neural Networks Trained with STDP	May 31, 2021	Action RecognitionSpatio-temporal Action Recognition	—Unverified
A Survey on Backbones for Deep Video Action Recognition	May 9, 2024	Action RecognitionDiversity	—Unverified
A Survey on Generative AI and LLM for Video Generation, Understanding, and Streaming	Jan 30, 2024	Video GenerationVideo Understanding	—Unverified
A Survey on Mamba Architecture for Vision Applications	Feb 11, 2025	Mambaobject-detection	—Unverified
A Survey on Video Analytics in Cloud-Edge-Terminal Collaborative Systems	Feb 10, 2025	Autonomous DrivingEdge-computing	—Unverified
A Transformer-Based Model for the Prediction of Human Gaze Behavior on Videos	Apr 10, 2024	Activity RecognitionGaze Prediction	—Unverified
Attend and Interact: Higher-Order Object Interactions for Video Understanding	Nov 16, 2017	Action ClassificationAction Recognition	—Unverified
Attend-Fusion: Efficient Audio-Visual Fusion for Video Classification	Aug 26, 2024	Video ClassificationVideo Understanding	—Unverified
Attention Is Not Enough: Mitigating the Distribution Discrepancy in Asynchronous Multimodal Sequence Fusion	Jan 1, 2021	Time SeriesTime Series Analysis	—Unverified
Audio-Visual Glance Network for Efficient Video Recognition	Aug 18, 2023	Video RecognitionVideo Understanding	—Unverified
Audio-Visual LLM for Video Understanding	Dec 11, 2023	AudioCapsLanguage Modeling	—Unverified
Audio Visual Scene-Aware Dialog Generation with Transformer-based Video Representations	Feb 21, 2022	Answer GenerationVideo Understanding	—Unverified
Audio-visual training for improved grounding in video-text LLMs	Jul 21, 2024	Video Understanding	—Unverified

Show:10 25 50

← PrevPage 27 of 46Next →

No leaderboard results yet.