SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 11011149 of 1149 papers

TitleStatusHype
Pooled Motion Features for First-Person VideosCode0
End-to-End Learning of Motion Representation for Video UnderstandingCode0
A Coding Framework and Benchmark towards Low-Bitrate Video UnderstandingCode0
Pairwise Emotional Relationship Recognition in Drama Videos: Dataset and BenchmarkCode0
EgoVLM: Policy Optimization for Egocentric Video UnderstandingCode0
On the Pitfalls of Batch Normalization for End-to-End Video Learning: A Study on Surgical Workflow AnalysisCode0
ECO: Efficient Convolutional Network for Online Video UnderstandingCode0
OccludeNet: A Causal Journey into Mixed-View Actor-Centric Video Action Recognition under OcclusionsCode0
DriftNet: Aggressive Driving Behavior Classification using 3D EfficientNet ArchitectureCode0
Video Representation Learning and Latent Concept Mining for Large-scale Multi-label Video ClassificationCode0
DramaQA: Character-Centered Video Story Understanding with Hierarchical QACode0
Are you Struggling? Dataset and Baselines for Struggle Determination in Assembly VideosCode0
NoisyActions2M: A Multimedia Dataset for Video Understanding from Noisy LabelsCode0
NeXtVLAD: An Efficient Neural Network to Aggregate Frame-level Features for Large-scale Video ClassificationCode0
Multi-Moments in Time: Learning and Interpreting Models for Multi-Action Video UnderstandingCode0
Dr^2Net: Dynamic Reversible Dual-Residual Networks for Memory-Efficient FinetuningCode0
Multimodal Dialogue State TrackingCode0
Don't Judge by the Look: Towards Motion Coherent Video RepresentationCode0
(Un)likelihood Training for Interpretable EmbeddingCode0
Unsupervised Adversarial Visual Level Domain Adaptation for Learning Video Object Detectors from ImagesCode0
video-SALMONN: Speech-Enhanced Audio-Visual Large Language ModelsCode0
X4D-SceneFormer: Enhanced Scene Understanding on 4D Point Cloud Videos through Cross-modal Knowledge TransferCode0
Are Vision LLMs Road-Ready? A Comprehensive Benchmark for Safety-Critical Driving Video UnderstandingCode0
Diagnosing Error in Temporal Action DetectorsCode0
Multi-attention Networks for Temporal Localization of Video-level LabelsCode0
MOFO: MOtion FOcused Self-Supervision for Video UnderstandingCode0
MOD: A Deep Mixture Model with Online Knowledge Distillation for Large Scale Video Temporal Concept LocalizationCode0
Detection-Fusion for Knowledge Graph Extraction from VideosCode0
Vamos: Versatile Action Models for Video UnderstandingCode0
Are current long-term video understanding datasets long-term?Code0
Audio Caption in a Car Setting with a Sentence-Level LossCode0
VCRBench: Exploring Long-form Causal Reasoning Capabilities of Large Video Language ModelsCode0
VideoDG: Generalizing Temporal Relations in Videos to Novel DomainsCode0
Detect-and-Track: Efficient Pose Estimation in VideosCode0
MINOTAUR: Multi-task Video Grounding From Multimodal QueriesCode0
AdaVideoRAG: Omni-Contextual Adaptive Retrieval-Augmented Efficient Long Video UnderstandingCode0
Deep Learning Methods for Efficient Large Scale Video LabelingCode0
Creative Flow+ DatasetCode0
Contextual Explainable Video Representation: Human Perception-based UnderstandingCode0
A Challenge to Build Neuro-Symbolic Video AgentsCode0
METok: Multi-Stage Event-based Token Compression for Efficient Long Video UnderstandingCode0
Constrained-size Tensorflow Models for YouTube-8M Video Understanding ChallengeCode0
Masked Autoencoders for Egocentric Video Understanding @ Ego4D Challenge 2022Code0
Context R-CNN: Long Term Temporal Context for Per-Camera Object DetectionCode0
SoccerDB: A Large-Scale Database for Comprehensive Video UnderstandingCode0
Video Action UnderstandingCode0
VURF: A General-purpose Reasoning and Self-refinement Framework for Video UnderstandingCode0
Long-Term Feature Banks for Detailed Video UnderstandingCode0
Localizing Moments in Video with Temporal LanguageCode0
Show:102550
← PrevPage 23 of 23Next →

No leaderboard results yet.