SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 11011125 of 1149 papers

TitleStatusHype
Pooled Motion Features for First-Person VideosCode0
End-to-End Learning of Motion Representation for Video UnderstandingCode0
A Coding Framework and Benchmark towards Low-Bitrate Video UnderstandingCode0
Pairwise Emotional Relationship Recognition in Drama Videos: Dataset and BenchmarkCode0
EgoVLM: Policy Optimization for Egocentric Video UnderstandingCode0
On the Pitfalls of Batch Normalization for End-to-End Video Learning: A Study on Surgical Workflow AnalysisCode0
ECO: Efficient Convolutional Network for Online Video UnderstandingCode0
OccludeNet: A Causal Journey into Mixed-View Actor-Centric Video Action Recognition under OcclusionsCode0
DriftNet: Aggressive Driving Behavior Classification using 3D EfficientNet ArchitectureCode0
Video Representation Learning and Latent Concept Mining for Large-scale Multi-label Video ClassificationCode0
DramaQA: Character-Centered Video Story Understanding with Hierarchical QACode0
Are you Struggling? Dataset and Baselines for Struggle Determination in Assembly VideosCode0
NoisyActions2M: A Multimedia Dataset for Video Understanding from Noisy LabelsCode0
NeXtVLAD: An Efficient Neural Network to Aggregate Frame-level Features for Large-scale Video ClassificationCode0
Multi-Moments in Time: Learning and Interpreting Models for Multi-Action Video UnderstandingCode0
Dr^2Net: Dynamic Reversible Dual-Residual Networks for Memory-Efficient FinetuningCode0
Multimodal Dialogue State TrackingCode0
Don't Judge by the Look: Towards Motion Coherent Video RepresentationCode0
(Un)likelihood Training for Interpretable EmbeddingCode0
Unsupervised Adversarial Visual Level Domain Adaptation for Learning Video Object Detectors from ImagesCode0
video-SALMONN: Speech-Enhanced Audio-Visual Large Language ModelsCode0
X4D-SceneFormer: Enhanced Scene Understanding on 4D Point Cloud Videos through Cross-modal Knowledge TransferCode0
Are Vision LLMs Road-Ready? A Comprehensive Benchmark for Safety-Critical Driving Video UnderstandingCode0
Diagnosing Error in Temporal Action DetectorsCode0
Multi-attention Networks for Temporal Localization of Video-level LabelsCode0
Show:102550
← PrevPage 45 of 46Next →

No leaderboard results yet.