SOTAVerified

Temporal Sentence Grounding

Temporal sentence grounding (TSG) aims to locate a specific moment from an untrimmed video with a given natural language query. For this task, different levels of supervision are used. 1) Weak supervision: video-level action category set; 2) Semi-weak supervision: video-level action category set, and action annotations at several timestamps; 3) Full supervision: Action category and action interval annotations of all actions in untrimmed videos.

Papers

Showing 125 of 43 papers

TitleStatusHype
Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language ModelsCode2
Uncovering Hidden Challenges in Query-Based Video Moment RetrievalCode1
D3G: Exploring Gaussian Prior for Temporal Sentence Grounding with Glance AnnotationCode1
Weakly Supervised Temporal Sentence Grounding With Gaussian-Based Contrastive Proposal LearningCode1
DeCafNet: Delegate and Conquer for Efficient Temporal Grounding in Long VideosCode1
Negative Sample Matters: A Renaissance of Metric Learning for Temporal GroundingCode1
BAM-DETR: Boundary-Aligned Moment Detection Transformer for Temporal Sentence Grounding in VideosCode1
Gaussian Mixture Proposals with Pull-Push Learning Scheme to Capture Diverse Events for Weakly Supervised Temporal Video GroundingCode1
Span-based Localizing Network for Natural Language Video LocalizationCode1
Learning Temporal Sentence Grounding From Narrated EgoVideosCode0
Semantic Conditioned Dynamic Modulation for Temporal Sentence Grounding in VideosCode0
Temporal Sentence Grounding in Streaming VideosCode0
Bias-Conflict Sample Synthesis and Adversarial Removal Debias Strategy for Temporal Sentence Grounding in VideoCode0
A Closer Look at Temporal Sentence Grounding in Videos: Dataset and MetricCode0
Transformer with Controlled Attention for Synchronous Motion CaptioningCode0
Diversifying Query: Region-Guided Transformer for Temporal Sentence GroundingCode0
Efficient Temporal Sentence Grounding in Videos with Multi-Teacher Knowledge DistillationCode0
Temporal Sentence Grounding in Videos: A Survey and Future Directions0
Towards Debiasing Temporal Sentence Grounding in Video0
Towards Visual-Prompt Temporal Answering Grounding in Medical Instructional Video0
Tracking Objects and Activities with Attention for Temporal Sentence Grounding0
Transform-Equivariant Consistency Learning for Temporal Sentence Grounding0
Video sentence grounding with temporally global textual knowledge0
Weakly Supervised Temporal Sentence Grounding With Uncertainty-Guided Self-Training0
Weakly Supervised Temporal Sentence Grounding via Positive Sample Mining0
Show:102550
← PrevPage 1 of 2Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1DeCafNetR1@0.747.55Unverified
2AdaFocus (Full, MViT-Charades-Pretrain-feature, MMN model)R1@0.738.6Unverified
3AdaFocus (Full, I3D-Charades-Pretrain-feature, MMN model)R1@0.735.6Unverified
4MMN (Full, MViT-K400-Pretrain-feature, evaluated by AdaFocus)R1@0.732.2Unverified
5MMN (Full, I3D-K400-Pretrain-feature, evaluated by AdaFocus)R1@0.729.8Unverified
6AdaFocus (Weak, MViT-Charades-Pretrain-feature, CPL model)R1@0.723.2Unverified
7AdaFocus (Weak, I3D-Charades-Pretrain-feature, CPL model)R1@0.722.4Unverified
8CPL (Weak, MViT-K400-Pretrain-feature, evaluated by AdaFocus)R1@0.721.8Unverified
9AdaFocus (Semi-weak, MViT-Charades-Pretrain-feature, D3G model)R1@0.721.8Unverified
10AdaFocus (Semi-weak, I3D-Charades-Pretrain-feature, D3G model)R1@0.721.1Unverified
#ModelMetricClaimedVerifiedStatus
1DeCafNet-100%R@1,IoU=0.323.2Unverified
2DeCafNet-50%R@1,IoU=0.321.29Unverified
3VSLNetR@1,IoU=0.311.7Unverified