SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 401450 of 1149 papers

TitleStatusHype
Bootstrap Your Own Views: Masked Ego-Exo Modeling for Fine-grained View-invariant Video RepresentationsCode0
ECO: Efficient Convolutional Network for Online Video UnderstandingCode0
(Un)likelihood Training for Interpretable EmbeddingCode0
Unsupervised Adversarial Visual Level Domain Adaptation for Learning Video Object Detectors from ImagesCode0
UAL-Bench: The First Comprehensive Unusual Activity Localization BenchmarkCode0
ACVUBench: Audio-Centric Video Understanding BenchmarkCode0
TUNA: Comprehensive Fine-grained Temporal Understanding Evaluation on Dense Dynamic VideosCode0
TS-LSTM and Temporal-Inception: Exploiting Spatiotemporal Dynamics for Activity RecognitionCode0
Beyond Raw Videos: Understanding Edited Videos with Large Multimodal ModelCode0
Beyond Instructional Videos: Probing for More Diverse Visual-Textual Grounding on YouTubeCode0
DriftNet: Aggressive Driving Behavior Classification using 3D EfficientNet ArchitectureCode0
DramaQA: Character-Centered Video Story Understanding with Hierarchical QACode0
Dr^2Net: Dynamic Reversible Dual-Residual Networks for Memory-Efficient FinetuningCode0
Towards Multimodal Video Paragraph Captioning Models Robust to Missing ModalityCode0
Don't Judge by the Look: Towards Motion Coherent Video RepresentationCode0
Tiny Video NetworksCode0
The YouTube-8M Kaggle Competition: Challenges and MethodsCode0
The Visual Centrifuge: Model-Free Layered Video RepresentationsCode0
Temporal Tessellation: A Unified Approach for Video AnalysisCode0
The Monkeytyping Solution to the YouTube-8M Video Understanding ChallengeCode0
In My Perspective, In My Hands: Accurate Egocentric 2D Hand Pose and Action RecognitionCode0
Temporal Modeling Approaches for Large-scale Youtube-8M Video UnderstandingCode0
Temporal-Oriented Recipe for Transferring Large Vision-Language Model to Video UnderstandingCode0
Temporally smooth online action detection using cycle-consistent future anticipationCode0
Temporal Action Proposal Generation With Action Frequency Adaptive NetworkCode0
A Video Is Worth 4096 Tokens: Verbalize Videos To Understand Them In Zero ShotCode0
Diagnosing Error in Temporal Action DetectorsCode0
Telling Stories for Common Sense Zero-Shot Action RecognitionCode0
ICSVR: Investigating Compositional and Syntactic Understanding in Video Retrieval ModelsCode0
Technical Report for CVPR 2022 LOVEU AQTC ChallengeCode0
4D Generic Video Object ProposalsCode0
Detection-Fusion for Knowledge Graph Extraction from VideosCode0
https://arxiv.org/abs/2407.00634Code0
Teacher Agent: A Knowledge Distillation-Free Framework for Rehearsal-based Video Incremental LearningCode0
How Would The Viewer Feel? Estimating Wellbeing From Video ScenariosCode0
Detect-and-Track: Efficient Pose Estimation in VideosCode0
Task-Aware KV Compression For Cost-Effective Long Video UnderstandingCode0
HopaDIFF: Holistic-Partial Aware Fourier Conditioned Diffusion for Referring Human Action Segmentation in Multi-Person ScenariosCode0
Submission to Generic Event Boundary Detection Challenge@CVPR 2022: Local Context Modeling and Global Boundary Decoding ApproachCode0
HLV-1K: A Large-scale Hour-Long Video Benchmark for Time-Specific Long Video UnderstandingCode0
Suppressing Static Visual Cues via Normalizing Flows for Self-Supervised Video Representation LearningCode0
Deep Learning Methods for Efficient Large Scale Video LabelingCode0
Hierarchical Deep Recurrent Architecture for Video UnderstandingCode0
Streaming Detection of Queried Event StartCode0
Video action detection by learning graph-based spatio-temporal interactionsCode0
HA-ViD: A Human Assembly Video Dataset for Comprehensive Assembly Knowledge UnderstandingCode0
Spatio-Temporal Perturbations for Video AttributionCode0
Hallucination Mitigation Prompts Long-term Video UnderstandingCode0
SoccerNet 2024 Challenges ResultsCode0
Snippet-Aware Transformer With Multiple Action Elements for Skeleton-Based Action SegmentationCode0
Show:102550
← PrevPage 9 of 23Next →

No leaderboard results yet.