SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 501525 of 1149 papers

TitleStatusHype
On the Pitfalls of Batch Normalization for End-to-End Video Learning: A Study on Surgical Workflow AnalysisCode0
Are Vision LLMs Road-Ready? A Comprehensive Benchmark for Safety-Critical Driving Video UnderstandingCode0
NeXtVLAD: An Efficient Neural Network to Aggregate Frame-level Features for Large-scale Video ClassificationCode0
Are you Struggling? Dataset and Baselines for Struggle Determination in Assembly VideosCode0
NoisyActions2M: A Multimedia Dataset for Video Understanding from Noisy LabelsCode0
EVA02-AT: Egocentric Video-Language Understanding with Spatial-Temporal Rotary Positional Embeddings and Symmetric OptimizationCode0
OccludeNet: A Causal Journey into Mixed-View Actor-Centric Video Action Recognition under OcclusionsCode0
CARPe Posterum: A Convolutional Approach for Real-time Pedestrian Path PredictionCode0
Capturing Temporal Information in a Single Frame: Channel Sampling Strategies for Action RecognitionCode0
Are current long-term video understanding datasets long-term?Code0
Multi-Moments in Time: Learning and Interpreting Models for Multi-Action Video UnderstandingCode0
Enhancing Temporal Modeling of Video LLMs via Time GatingCode0
Multi-attention Networks for Temporal Localization of Video-level LabelsCode0
MOFO: MOtion FOcused Self-Supervision for Video UnderstandingCode0
Multimodal Dialogue State TrackingCode0
End-to-End Learning of Motion Representation for Video UnderstandingCode0
MINOTAUR: Multi-task Video Grounding From Multimodal QueriesCode0
MOD: A Deep Mixture Model with Online Knowledge Distillation for Large Scale Video Temporal Concept LocalizationCode0
B-VLLM: A Vision Large Language Model with Balanced Spatio-Temporal TokensCode0
EgoVLM: Policy Optimization for Egocentric Video UnderstandingCode0
METok: Multi-Stage Event-based Token Compression for Efficient Long Video UnderstandingCode0
Masked Autoencoders for Egocentric Video Understanding @ Ego4D Challenge 2022Code0
Bridging Perspectives: A Survey on Cross-view Collaborative Intelligence with Egocentric-Exocentric VisionCode0
LLaVA-OneVision: Easy Visual Task TransferCode0
Long-Term Feature Banks for Detailed Video UnderstandingCode0
Show:102550
← PrevPage 21 of 46Next →

No leaderboard results yet.