SOTAVerified

Action Localization

Action Localization is finding the spatial and temporal co ordinates for an action in a video. An action localization model will identify which frame an action start and ends in video and return the x,y coordinates of an action. Further the co ordinates will change when the object performing action undergoes a displacement.

Papers

Showing 150 of 369 papers

TitleStatusHype
The Surprising Effectiveness of Multimodal Large Language Models for Video Moment RetrievalCode2
Test-Time Zero-Shot Temporal Action LocalizationCode2
Temporal Action Localization with Enhanced Instant DiscriminabilityCode2
NMS Threshold matters for Ego4D Moment Queries -- 2nd place solution to the Ego4D Moment Queries Challenge 2023Code2
Where a Strong Backbone Meets Strong Features -- ActionFormer for Ego4D Moment Queries ChallengeCode2
Structured Attention Composition for Temporal Action LocalizationCode2
ActionFormer: Localizing Moments of Actions with TransformersCode2
Zero-Shot Temporal Interaction Localization for Egocentric VideosCode1
TimeLoc: A Unified End-to-End Framework for Precise Timestamp Localization in Long VideosCode1
XRF V2: A Dataset for Action Summarization with Wi-Fi Signals, and IMUs in Phones, Watches, Earbuds, and GlassesCode1
Temporal Action Localization with Cross Layer Task Decoupling and RefinementCode1
Open-Vocabulary Action Localization with Iterative Visual PromptingCode1
Towards Completeness: A Generalizable Action Proposal Generator for Zero-Shot Temporal Action LocalizationCode1
Probabilistic Vision-Language Representation for Weakly Supervised Temporal Action LocalizationCode1
HAT: History-Augmented Anchor Transformer for Online Temporal Action LocalizationCode1
Enhancing Temporal Action Localization: Advanced S6 Modeling with Recurrent MechanismCode1
ActionSwitch: Class-agnostic Detection of Simultaneous Actions in Streaming VideosCode1
Exploring Scalability of Self-Training for Open-Vocabulary Temporal Action LocalizationCode1
Referring Atomic Video Action RecognitionCode1
EgoExo-Fitness: Towards Egocentric and Exocentric Full-Body Action UnderstandingCode1
SFMViT: SlowFast Meet ViT in Chaotic WorldCode1
UniAV: Unified Audio-Visual Perception for Multi-Task Video Event LocalizationCode1
ASTRA: An Action Spotting TRAnsformer for Soccer VideosCode1
Realigning Confidence with Temporal Saliency Information for Point-Level Weakly-Supervised Temporal Action LocalizationCode1
Revisiting Foreground and Background Separation in Weakly-supervised Temporal Action Localization: A Clustering-based ApproachCode1
Temporal Action Localization for Inertial-based Human Activity RecognitionCode1
GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI NavigationCode1
HR-Pro: Point-supervised Temporal Action Localization via Hierarchical Reliability PropagationCode1
DDG-Net: Discriminability-Driven Graph Network for Weakly-supervised Temporal Action LocalizationCode1
Actionness Inconsistency-guided Contrastive Learning for Weakly-supervised Temporal Action LocalizationCode1
Multi-Granularity Hand Action DetectionCode1
Proposal-Based Multiple Instance Learning for Weakly-Supervised Temporal Action LocalizationCode1
Boosting Weakly-Supervised Temporal Action Localization with Text InformationCode1
Improving Weakly Supervised Temporal Action Localization by Bridging Train-Test Gap in Pseudo LabelsCode1
WEAR: An Outdoor Sports Dataset for Wearable and Egocentric Activity RecognitionCode1
TemporalMaxer: Maximize Temporal Context with only Max Pooling for Temporal Action LocalizationCode1
Chaotic World: A Large and Challenging Benchmark for Human Behavior Understanding in Chaotic EventsCode1
Re^2TAL: Rewiring Pretrained Video Backbones for Reversible Temporal Action LocalizationCode1
SimOn: A Simple Framework for Online Temporal Action LocalizationCode1
EgoTaskQA: Understanding Human Tasks in Egocentric VideosCode1
Entity-aware and Motion-aware Transformers for Language-driven Action Localization in VideosCode1
Convex Combination Consistency between Neighbors for Weakly-supervised Action LocalizationCode1
E^2TAD: An Energy-Efficient Tracking-based Action DetectorCode1
TALLFormer: Temporal Action Localization with a Long-memory TransformerCode1
Fine-grained Temporal Contrastive Learning for Weakly-supervised Temporal Action LocalizationCode1
Unsupervised Pre-training for Temporal Action Localization TasksCode1
OpenTAL: Towards Open Set Temporal Action LocalizationCode1
Weakly Supervised Temporal Action Localization via Representative Snippet Knowledge PropagationCode1
Everything at Once - Multi-Modal Fusion Transformer for Video RetrievalCode1
Set-Supervised Action Learning in Procedural Task Videos via Pairwise Order ConsistencyCode1
Show:102550
← PrevPage 1 of 8Next →

No leaderboard results yet.