Video Retrieval
The objective of video retrieval is as follows: given a text query and a pool of candidate videos, select the video which corresponds to the text query. Typically, the videos are returned as a ranked list of candidates and scored via document retrieval metrics.
Papers
Showing 1–10 of 486 papers
All datasetsMSR-VTT-1kADiDeMoMSR-VTTLSMDCActivityNetMSVDYouCook2FIVR-200KVATEXQuerYDSSv2-label retrievalSSv2-template retrieval
Benchmark Results
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | OmniVec | text-to-video R@10 | 89.4 | — | Unverified |
| 2 | CLIP4Clip | text-to-video R@10 | 81.6 | — | Unverified |
| 3 | OmniVec (pretrained) | text-to-video R@10 | 78.6 | — | Unverified |
| 4 | HunYuan_tvr (huge) | text-to-video R@1 | 62.9 | — | Unverified |
| 5 | CLIP-ViP | text-to-video R@1 | 57.7 | — | Unverified |
| 6 | PIDRo | text-to-video R@1 | 55.9 | — | Unverified |
| 7 | DMAE (ViT-B/16) | text-to-video R@1 | 55.5 | — | Unverified |
| 8 | HunYuan_tvr | text-to-video R@1 | 55 | — | Unverified |
| 9 | MuLTI | text-to-video R@1 | 54.7 | — | Unverified |
| 10 | EERCF | text-to-video R@1 | 54.1 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | Aurora (ours, r=64) | text-to-video R@5 | 77.4 | — | Unverified |
| 2 | InternVideo2-6B | text-to-video R@1 | 74.2 | — | Unverified |
| 3 | vid-TLDR (UMT-L) | text-to-video R@1 | 72.3 | — | Unverified |
| 4 | VAST | text-to-video R@1 | 72 | — | Unverified |
| 5 | COSA | text-to-video R@1 | 70.5 | — | Unverified |
| 6 | UMT-L (ViT-L/16) | text-to-video R@1 | 70.4 | — | Unverified |
| 7 | GRAM | text-to-video R@1 | 67.3 | — | Unverified |
| 8 | VALOR | text-to-video R@1 | 61.5 | — | Unverified |
| 9 | TESTA (ViT-B/16) | text-to-video R@1 | 61.2 | — | Unverified |
| 10 | VindLU | text-to-video R@1 | 61.2 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | GRAM | text-to-video R@1 | 64 | — | Unverified |
| 2 | VAST | text-to-video R@1 | 63.9 | — | Unverified |
| 3 | InternVideo2-6B | text-to-video R@1 | 62.8 | — | Unverified |
| 4 | VALOR | text-to-video R@1 | 59.9 | — | Unverified |
| 5 | UMT-L (ViT-L/16) | text-to-video R@1 | 58.8 | — | Unverified |
| 6 | vid-TLDR (UMT-L) | text-to-video R@1 | 58.1 | — | Unverified |
| 7 | COSA | text-to-video R@1 | 57.9 | — | Unverified |
| 8 | InternVideo2-6B | text-to-video R@1 | 55.9 | — | Unverified |
| 9 | InternVideo | text-to-video R@1 | 55.2 | — | Unverified |
| 10 | VLAB | text-to-video R@1 | 55.1 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | EMCL-Net (Ours)++ LSMDC Rohrbach et al. (2015) | text-to-video R@10 | 53.7 | — | Unverified |
| 2 | InternVideo2-6B | text-to-video R@1 | 46.4 | — | Unverified |
| 3 | vid-TLDR (UMT-L) | text-to-video R@1 | 43.1 | — | Unverified |
| 4 | UMT-L (ViT-L/16) | text-to-video R@1 | 43 | — | Unverified |
| 5 | HunYuan_tvr (huge) | text-to-video R@1 | 40.4 | — | Unverified |
| 6 | COSA | text-to-video R@1 | 39.4 | — | Unverified |
| 7 | mPLUG-2 | text-to-video R@1 | 34.4 | — | Unverified |
| 8 | VALOR | text-to-video R@1 | 34.2 | — | Unverified |
| 9 | InternVideo | text-to-video R@1 | 34 | — | Unverified |
| 10 | InternVideo2-6B | text-to-video R@1 | 33.8 | — | Unverified |