Moment Retrieval
Moment retrieval can de defined as the task of "localizing moments in a video given a user query".
Description from: QVHIGHLIGHTS: Detecting Moments and Highlights in Videos via Natural Language Queries
Image credit: QVHIGHLIGHTS: Detecting Moments and Highlights in Videos via Natural Language Queries
Papers
Showing 1–10 of 132 papers
Benchmark Results
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | UnLoc-L | R@1 IoU=0.5 | 66.1 | — | Unverified |
| 2 | UnLoc-B | R@1 IoU=0.5 | 64.5 | — | Unverified |
| 3 | DenoiseLoc | R@1 IoU=0.5 | 59.27 | — | Unverified |
| 4 | SG-DETR (w/ PT) | mAP | 58.8 | — | Unverified |
| 5 | SG-DETR | mAP | 54.1 | — | Unverified |
| 6 | LLaVA-MR | mAP | 52.73 | — | Unverified |
| 7 | FlashVTG | mAP | 52 | — | Unverified |
| 8 | InternVideo2-6B | mAP | 49.24 | — | Unverified |
| 9 | CG-DETR (w/ PT) | mAP | 47.97 | — | Unverified |
| 10 | VideoLights-B-pt | mAP | 47.94 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | SG-DETR (w/ PT) | R@1 IoU=0.5 | 71.1 | — | Unverified |
| 2 | LLaVA-MR | R@1 IoU=0.5 | 70.65 | — | Unverified |
| 3 | FlashVTG | R@1 IoU=0.5 | 70.32 | — | Unverified |
| 4 | SG-DETR | R@1 IoU=0.5 | 70.2 | — | Unverified |
| 5 | InternVideo2-6B | R@1 IoU=0.5 | 70.03 | — | Unverified |
| 6 | InternVideo2-1B | R@1 IoU=0.5 | 68.36 | — | Unverified |
| 7 | VideoChat-T (FT) | R@1 IoU=0.5 | 67.1 | — | Unverified |
| 8 | UniMD+Sync. | R@1 IoU=0.5 | 63.98 | — | Unverified |
| 9 | LD-DETR | R@1 IoU=0.5 | 62.58 | — | Unverified |
| 10 | VideoLights-B-pt | R@1 IoU=0.5 | 61.96 | — | Unverified |