Video Captioning
Video Captioning is a task of automatic captioning a video by understanding the action and event in the video which can help in the retrieval of the video efficiently through text.
Source: NITS-VC System for VATEX Video Captioning Challenge 2020
Papers
Showing 51–60 of 473 papers
All datasetsMSR-VTTMSVDYouCook2VATEXActivityNet CaptionsMSRVTT-CTNMSVD-CTNHindi MSR-VTTTVCChinaOpen-1kMSVD-IndonesianShot2Story20K
Benchmark Results
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | mPLUG-2 | CIDEr | 80 | — | Unverified |
| 2 | VAST | CIDEr | 78 | — | Unverified |
| 3 | GIT2 | CIDEr | 75.9 | — | Unverified |
| 4 | VLAB | CIDEr | 74.9 | — | Unverified |
| 5 | COSA | CIDEr | 74.7 | — | Unverified |
| 6 | VALOR | CIDEr | 74 | — | Unverified |
| 7 | MaMMUT (ours) | CIDEr | 73.6 | — | Unverified |
| 8 | VideoCoCa | CIDEr | 73.2 | — | Unverified |
| 9 | RTQ | CIDEr | 69.3 | — | Unverified |
| 10 | HowToCaption | CIDEr | 65.3 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | MaMMUT | CIDEr | 195.6 | — | Unverified |
| 2 | VLAB | CIDEr | 179.8 | — | Unverified |
| 3 | COSA | CIDEr | 178.5 | — | Unverified |
| 4 | VALOR | CIDEr | 178.5 | — | Unverified |
| 5 | mPLUG-2 | CIDEr | 165.8 | — | Unverified |
| 6 | HowToCaption | CIDEr | 154.2 | — | Unverified |
| 7 | HiTeA | CIDEr | 146.9 | — | Unverified |
| 8 | Vid2Seq | CIDEr | 146.2 | — | Unverified |
| 9 | VIOLETv2 | CIDEr | 139.2 | — | Unverified |
| 10 | RTQ | CIDEr | 123.4 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | VAST | BLEU-4 | 18.2 | — | Unverified |
| 2 | UniVL + MELTR | BLEU-4 | 17.92 | — | Unverified |
| 3 | UniVL | BLEU-4 | 17.35 | — | Unverified |
| 4 | VideoCoCa | BLEU-4 | 14.2 | — | Unverified |
| 5 | VLM | BLEU-4 | 12.27 | — | Unverified |
| 6 | E2vidD6-MASSvid-BiD | BLEU-4 | 12.04 | — | Unverified |
| 7 | TextKG | BLEU-4 | 11.7 | — | Unverified |
| 8 | COOT | BLEU-4 | 11.3 | — | Unverified |
| 9 | COSA | BLEU-4 | 10.1 | — | Unverified |
| 10 | HowToCaption | BLEU-4 | 8.8 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | VALOR | BLEU-4 | 45.6 | — | Unverified |
| 2 | VAST | BLEU-4 | 45 | — | Unverified |
| 3 | COSA | BLEU-4 | 43.7 | — | Unverified |
| 4 | VideoCoCa | BLEU-4 | 39.7 | — | Unverified |
| 5 | IcoCap (ViT-B/16) | BLEU-4 | 37.4 | — | Unverified |
| 6 | IcoCap (ViT-B/32) | BLEU-4 | 36.9 | — | Unverified |
| 7 | VASTA (Kinetics-backbone) | BLEU-4 | 36.25 | — | Unverified |
| 8 | CoCap (ViT/L14) | BLEU-4 | 35.8 | — | Unverified |
| 9 | ORG-TRL | BLEU-4 | 32.1 | — | Unverified |
| 10 | NITS-VC | BLEU-4 | 20 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | VideoCoCa | BLEU4 | 14.7 | — | Unverified |
| 2 | VLTinT (ae-test split) C3D/Ling | BLEU4 | 14.5 | — | Unverified |
| 3 | VLCap (ae-test split) - Appearance + Language | BLEU4 | 13.38 | — | Unverified |
| 4 | COOT (ae-test split) - Only Appearance features | BLEU4 | 10.85 | — | Unverified |
| 5 | MART (ae-test split) - Appearance + Flow | BLEU4 | 10.33 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | SBD_Keyframe | BLEU4 | 41.01 | — | Unverified |
| 2 | V+S-Att-based | BLEU4 | 36.2 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | GVT | BLEU4 | 17.7 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | VNS-GRU (Cross-Lingual) | BLEU-4 | 58.68 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | Shot2Story | CIDEr | 37.4 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | Vid2Seq | CIDEr | 120.5 | — | Unverified |