SOTAVerified

Temporal Sentence Grounding

Temporal sentence grounding (TSG) aims to locate a specific moment from an untrimmed video with a given natural language query. For this task, different levels of supervision are used. 1) Weak supervision: video-level action category set; 2) Semi-weak supervision: video-level action category set, and action annotations at several timestamps; 3) Full supervision: Action category and action interval annotations of all actions in untrimmed videos.

Papers

Showing 2643 of 43 papers

TitleStatusHype
Constraint and Union for Partially-Supervised Temporal Sentence Grounding0
Rethinking the Video Sampling and Reasoning Strategies for Temporal Sentence Grounding0
Weakly Supervised Temporal Sentence Grounding With Uncertainty-Guided Self-Training0
Learning to Focus on the Foreground for Temporal Sentence Grounding0
Hierarchical Local-Global Transformer for Temporal Sentence Grounding0
Reducing the Vision and Language Bias for Temporal Sentence Grounding0
Towards Visual-Prompt Temporal Answering Grounding in Medical Instructional Video0
A Closer Look at Debiased Temporal Sentence Grounding in Videos: Dataset, Metric, and Approach0
Exploring Optical-Flow-Guided Motion and Detection-Based Appearance for Temporal Sentence Grounding0
Temporal Sentence Grounding in Videos: A Survey and Future Directions0
Memory-Guided Semantic Learning Network for Temporal Sentence Grounding0
Exploring Motion and Appearance Information for Temporal Sentence Grounding0
Towards Debiasing Temporal Sentence Grounding in Video0
A Survey on Temporal Sentence Grounding in Videos0
Progressively Guide to Attend: An Iterative Alignment Framework for Temporal Sentence Grounding0
Context-aware Biaffine Localizing Network for Temporal Sentence Grounding0
A Closer Look at Temporal Sentence Grounding in Videos: Dataset and MetricCode0
Semantic Conditioned Dynamic Modulation for Temporal Sentence Grounding in VideosCode0
Show:102550
← PrevPage 2 of 2Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1DeCafNetR1@0.747.55Unverified
2AdaFocus (Full, MViT-Charades-Pretrain-feature, MMN model)R1@0.738.6Unverified
3AdaFocus (Full, I3D-Charades-Pretrain-feature, MMN model)R1@0.735.6Unverified
4MMN (Full, MViT-K400-Pretrain-feature, evaluated by AdaFocus)R1@0.732.2Unverified
5MMN (Full, I3D-K400-Pretrain-feature, evaluated by AdaFocus)R1@0.729.8Unverified
6AdaFocus (Weak, MViT-Charades-Pretrain-feature, CPL model)R1@0.723.2Unverified
7AdaFocus (Weak, I3D-Charades-Pretrain-feature, CPL model)R1@0.722.4Unverified
8CPL (Weak, MViT-K400-Pretrain-feature, evaluated by AdaFocus)R1@0.721.8Unverified
9AdaFocus (Semi-weak, MViT-Charades-Pretrain-feature, D3G model)R1@0.721.8Unverified
10AdaFocus (Semi-weak, I3D-Charades-Pretrain-feature, D3G model)R1@0.721.1Unverified
#ModelMetricClaimedVerifiedStatus
1DeCafNet-100%R@1,IoU=0.323.2Unverified
2DeCafNet-50%R@1,IoU=0.321.29Unverified
3VSLNetR@1,IoU=0.311.7Unverified