SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 601650 of 1149 papers

TitleStatusHype
What can Off-the-Shelves Large Multi-Modal Models do for Dynamic Scene Graph Generation?0
What Makes a Video a Video: Analyzing Temporal Information in Video Understanding Models and Datasets0
When Work Matters: Transforming Classical Network Structures to Graph CNN0
WildQA: In-the-Wild Video Question Answering0
Wolf: Captioning Everything with a World Summarization Framework0
WorldQA: Multimodal World Knowledge in Videos through Long-Chain Reasoning0
WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs0
X-LeBench: A Benchmark for Extremely Long Egocentric Video Understanding0
YouMVOS: An Actor-Centric Multi-Shot Video Object Segmentation Dataset0
YouTube-8M Video Understanding Challenge Approach and Applications0
ZEETAD: Adapting Pretrained Vision-Language Model for Zero-Shot End-to-End Temporal Action Detection0
Zero-shot Action Localization via the Confidence of Large Vision-Language Models0
Zero-Shot Action Recognition in Surveillance Videos0
Zero-Shot Action Recognition in Videos: A Survey0
Zero-Shot Long-Form Video Understanding through Screenplay0
Zero-shot Shark Tracking and Biometrics from Aerial Imagery0
Hierarchical Video Frame Sequence Representation with Deep Convolutional Graph Network0
Zero-Shot Video Question Answering with Procedural Programs0
1st Place Winner of the 2024 Pixel-level Video Understanding in the Wild (CVPR'24 PVUW) Challenge in Video Panoptic Segmentation and Best Long Video Consistency of Video Semantic Segmentation0
Multimodal Fusion and Coherence Modeling for Video Topic Segmentation0
FE-Adapter: Adapting Image-based Emotion Classifiers to Videos0
An Analysis of Data Transformation Effects on Segment Anything 20
PreMind: Multi-Agent Video Understanding for Advanced Indexing of Presentation-style Videos0
2nd Place Solution for PVUW Challenge 2024: Video Panoptic Segmentation0
3DSRBench: A Comprehensive 3D Spatial Reasoning Benchmark0
3rd Place Solution for PVUW Challenge 2024: Video Panoptic Segmentation0
A Backpack Full of Skills: Egocentric Video Understanding with Diverse Task Perspectives0
Abductive Ego-View Accident Video Understanding for Safe Driving Perception0
ActAR: Actor-Driven Pose Embeddings for Video Action Recognition0
Action Reimagined: Text-to-Pose Video Editing for Dynamic Human Actions0
Action Sensitivity Learning for Temporal Action Localization0
Action Understanding with Multiple Classes of Actors0
Actor-Action Semantic Segmentation with Grouping Process Models0
AdaCM^2: On Understanding Extremely Long-Term Video with Adaptive Cross-Modality Memory Reduction0
AdaCM^2: On Understanding Extremely Long-Term Video with Adaptive Cross-Modality Memory Reduction0
AdapNet: Adaptability Decomposing Encoder-Decoder Network for Weakly Supervised Action Recognition and Localization0
Adapting Pre-trained 3D Models for Point Cloud Video Understanding via Cross-frame Spatio-temporal Perception0
Adaptive Intermediate Representations for Video Understanding0
Adaptive Video Understanding Agent: Enhancing efficiency with dynamic frame sampling and feedback-driven reasoning0
AdaTP: Attention-Debiased Token Pruning for Video Large Language Models0
A Decade of Action Quality Assessment: Largest Systematic Survey of Trends, Challenges, and Future Directions0
Adversarial Machine Learning Attacks Against Video Anomaly Detection Systems0
Adversarial Robustness in RGB-Skeleton Action Recognition: Leveraging Attention Modality Reweighter0
AE-Net:Adjoint Enhancement Network for Efficient Action Recognition in Video Understanding0
AFO-TAD: Anchor-free One-Stage Detector for Temporal Action Detection0
Aggregating Frame-level Features for Large-Scale Video Classification0
AirLetters: An Open Video Dataset of Characters Drawn in the Air0
Aligned Better, Listen Better for Audio-Visual Large Language Models0
ALLVB: All-in-One Long Video Understanding Benchmark0
AMEGO: Active Memory from long EGOcentric videos0
Show:102550
← PrevPage 13 of 23Next →

No leaderboard results yet.