Temporal Sentence Grounding
Temporal sentence grounding (TSG) aims to locate a specific moment from an untrimmed video with a given natural language query. For this task, different levels of supervision are used. 1) Weak supervision: video-level action category set; 2) Semi-weak supervision: video-level action category set, and action annotations at several timestamps; 3) Full supervision: Action category and action interval annotations of all actions in untrimmed videos.
Papers
Showing 1–10 of 43 papers
Benchmark Results
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | DeCafNet | R1@0.7 | 47.55 | — | Unverified |
| 2 | AdaFocus (Full, MViT-Charades-Pretrain-feature, MMN model) | R1@0.7 | 38.6 | — | Unverified |
| 3 | AdaFocus (Full, I3D-Charades-Pretrain-feature, MMN model) | R1@0.7 | 35.6 | — | Unverified |
| 4 | MMN (Full, MViT-K400-Pretrain-feature, evaluated by AdaFocus) | R1@0.7 | 32.2 | — | Unverified |
| 5 | MMN (Full, I3D-K400-Pretrain-feature, evaluated by AdaFocus) | R1@0.7 | 29.8 | — | Unverified |
| 6 | AdaFocus (Weak, MViT-Charades-Pretrain-feature, CPL model) | R1@0.7 | 23.2 | — | Unverified |
| 7 | AdaFocus (Weak, I3D-Charades-Pretrain-feature, CPL model) | R1@0.7 | 22.4 | — | Unverified |
| 8 | CPL (Weak, MViT-K400-Pretrain-feature, evaluated by AdaFocus) | R1@0.7 | 21.8 | — | Unverified |
| 9 | AdaFocus (Semi-weak, MViT-Charades-Pretrain-feature, D3G model) | R1@0.7 | 21.8 | — | Unverified |
| 10 | AdaFocus (Semi-weak, I3D-Charades-Pretrain-feature, D3G model) | R1@0.7 | 21.1 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | DeCafNet-100% | R@1,IoU=0.3 | 23.2 | — | Unverified |
| 2 | DeCafNet-50% | R@1,IoU=0.3 | 21.29 | — | Unverified |
| 3 | VSLNet | R@1,IoU=0.3 | 11.7 | — | Unverified |