SOTAVerified

Temporal Sentence Grounding

Temporal sentence grounding (TSG) aims to locate a specific moment from an untrimmed video with a given natural language query. For this task, different levels of supervision are used. 1) Weak supervision: video-level action category set; 2) Semi-weak supervision: video-level action category set, and action annotations at several timestamps; 3) Full supervision: Action category and action interval annotations of all actions in untrimmed videos.

Papers

Showing 143 of 43 papers

TitleStatusHype
Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language ModelsCode2
DeCafNet: Delegate and Conquer for Efficient Temporal Grounding in Long VideosCode1
Gaussian Mixture Proposals with Pull-Push Learning Scheme to Capture Diverse Events for Weakly Supervised Temporal Video GroundingCode1
BAM-DETR: Boundary-Aligned Moment Detection Transformer for Temporal Sentence Grounding in VideosCode1
D3G: Exploring Gaussian Prior for Temporal Sentence Grounding with Glance AnnotationCode1
Weakly Supervised Temporal Sentence Grounding With Gaussian-Based Contrastive Proposal LearningCode1
Negative Sample Matters: A Renaissance of Metric Learning for Temporal GroundingCode1
Uncovering Hidden Challenges in Query-Based Video Moment RetrievalCode1
Span-based Localizing Network for Natural Language Video LocalizationCode1
Weakly Supervised Temporal Sentence Grounding via Positive Sample Mining0
Contrast-Unity for Partially-Supervised Temporal Sentence Grounding0
Diversified Augmentation with Domain Adaptation for Debiased Video Temporal Grounding0
Multi-Pair Temporal Sentence Grounding via Multi-Thread Knowledge Transfer Network0
Transformer with Controlled Attention for Synchronous Motion CaptioningCode0
Diversifying Query: Region-Guided Transformer for Temporal Sentence GroundingCode0
Video sentence grounding with temporally global textual knowledge0
Bias-Conflict Sample Synthesis and Adversarial Removal Debias Strategy for Temporal Sentence Grounding in VideoCode0
Grounding-Prompter: Prompting LLM with Multimodal Information for Temporal Sentence Grounding in Long Videos0
Towards Weakly Supervised End-to-end Learning for Long-video Action Recognition0
Learning Temporal Sentence Grounding From Narrated EgoVideosCode0
Temporal Sentence Grounding in Streaming VideosCode0
Efficient Temporal Sentence Grounding in Videos with Multi-Teacher Knowledge DistillationCode0
Transform-Equivariant Consistency Learning for Temporal Sentence Grounding0
You Can Ground Earlier than See: An Effective and Efficient Pipeline for Temporal Sentence Grounding in Compressed Videos0
Tracking Objects and Activities with Attention for Temporal Sentence Grounding0
Constraint and Union for Partially-Supervised Temporal Sentence Grounding0
Rethinking the Video Sampling and Reasoning Strategies for Temporal Sentence Grounding0
Weakly Supervised Temporal Sentence Grounding With Uncertainty-Guided Self-Training0
Learning to Focus on the Foreground for Temporal Sentence Grounding0
Hierarchical Local-Global Transformer for Temporal Sentence Grounding0
Reducing the Vision and Language Bias for Temporal Sentence Grounding0
Towards Visual-Prompt Temporal Answering Grounding in Medical Instructional Video0
A Closer Look at Debiased Temporal Sentence Grounding in Videos: Dataset, Metric, and Approach0
Exploring Optical-Flow-Guided Motion and Detection-Based Appearance for Temporal Sentence Grounding0
Temporal Sentence Grounding in Videos: A Survey and Future Directions0
Memory-Guided Semantic Learning Network for Temporal Sentence Grounding0
Exploring Motion and Appearance Information for Temporal Sentence Grounding0
Towards Debiasing Temporal Sentence Grounding in Video0
A Survey on Temporal Sentence Grounding in Videos0
Progressively Guide to Attend: An Iterative Alignment Framework for Temporal Sentence Grounding0
Context-aware Biaffine Localizing Network for Temporal Sentence Grounding0
A Closer Look at Temporal Sentence Grounding in Videos: Dataset and MetricCode0
Semantic Conditioned Dynamic Modulation for Temporal Sentence Grounding in VideosCode0
Show:102550

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1DeCafNetR1@0.747.55Unverified
2AdaFocus (Full, MViT-Charades-Pretrain-feature, MMN model)R1@0.738.6Unverified
3AdaFocus (Full, I3D-Charades-Pretrain-feature, MMN model)R1@0.735.6Unverified
4MMN (Full, MViT-K400-Pretrain-feature, evaluated by AdaFocus)R1@0.732.2Unverified
5MMN (Full, I3D-K400-Pretrain-feature, evaluated by AdaFocus)R1@0.729.8Unverified
6AdaFocus (Weak, MViT-Charades-Pretrain-feature, CPL model)R1@0.723.2Unverified
7AdaFocus (Weak, I3D-Charades-Pretrain-feature, CPL model)R1@0.722.4Unverified
8CPL (Weak, MViT-K400-Pretrain-feature, evaluated by AdaFocus)R1@0.721.8Unverified
9AdaFocus (Semi-weak, MViT-Charades-Pretrain-feature, D3G model)R1@0.721.8Unverified
10AdaFocus (Semi-weak, I3D-Charades-Pretrain-feature, D3G model)R1@0.721.1Unverified
#ModelMetricClaimedVerifiedStatus
1DeCafNet-100%R@1,IoU=0.323.2Unverified
2DeCafNet-50%R@1,IoU=0.321.29Unverified
3VSLNetR@1,IoU=0.311.7Unverified