SOTAVerified

Temporal Sentence Grounding

Temporal sentence grounding (TSG) aims to locate a specific moment from an untrimmed video with a given natural language query. For this task, different levels of supervision are used. 1) Weak supervision: video-level action category set; 2) Semi-weak supervision: video-level action category set, and action annotations at several timestamps; 3) Full supervision: Action category and action interval annotations of all actions in untrimmed videos.

Papers

Showing 2643 of 43 papers

TitleStatusHype
A Closer Look at Debiased Temporal Sentence Grounding in Videos: Dataset, Metric, and Approach0
You Can Ground Earlier than See: An Effective and Efficient Pipeline for Temporal Sentence Grounding in Compressed Videos0
Towards Weakly Supervised End-to-end Learning for Long-video Action Recognition0
A Survey on Temporal Sentence Grounding in Videos0
Constraint and Union for Partially-Supervised Temporal Sentence Grounding0
Context-aware Biaffine Localizing Network for Temporal Sentence Grounding0
Contrast-Unity for Partially-Supervised Temporal Sentence Grounding0
Diversified Augmentation with Domain Adaptation for Debiased Video Temporal Grounding0
Exploring Motion and Appearance Information for Temporal Sentence Grounding0
Exploring Optical-Flow-Guided Motion and Detection-Based Appearance for Temporal Sentence Grounding0
Grounding-Prompter: Prompting LLM with Multimodal Information for Temporal Sentence Grounding in Long Videos0
Hierarchical Local-Global Transformer for Temporal Sentence Grounding0
Learning to Focus on the Foreground for Temporal Sentence Grounding0
Memory-Guided Semantic Learning Network for Temporal Sentence Grounding0
Multi-Pair Temporal Sentence Grounding via Multi-Thread Knowledge Transfer Network0
Progressively Guide to Attend: An Iterative Alignment Framework for Temporal Sentence Grounding0
Reducing the Vision and Language Bias for Temporal Sentence Grounding0
Rethinking the Video Sampling and Reasoning Strategies for Temporal Sentence Grounding0
Show:102550
← PrevPage 2 of 2Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1DeCafNetR1@0.747.55Unverified
2AdaFocus (Full, MViT-Charades-Pretrain-feature, MMN model)R1@0.738.6Unverified
3AdaFocus (Full, I3D-Charades-Pretrain-feature, MMN model)R1@0.735.6Unverified
4MMN (Full, MViT-K400-Pretrain-feature, evaluated by AdaFocus)R1@0.732.2Unverified
5MMN (Full, I3D-K400-Pretrain-feature, evaluated by AdaFocus)R1@0.729.8Unverified
6AdaFocus (Weak, MViT-Charades-Pretrain-feature, CPL model)R1@0.723.2Unverified
7AdaFocus (Weak, I3D-Charades-Pretrain-feature, CPL model)R1@0.722.4Unverified
8CPL (Weak, MViT-K400-Pretrain-feature, evaluated by AdaFocus)R1@0.721.8Unverified
9AdaFocus (Semi-weak, MViT-Charades-Pretrain-feature, D3G model)R1@0.721.8Unverified
10AdaFocus (Semi-weak, I3D-Charades-Pretrain-feature, D3G model)R1@0.721.1Unverified
#ModelMetricClaimedVerifiedStatus
1DeCafNet-100%R@1,IoU=0.323.2Unverified
2DeCafNet-50%R@1,IoU=0.321.29Unverified
3VSLNetR@1,IoU=0.311.7Unverified