Visual Dialog
Visual Dialog requires an AI agent to hold a meaningful dialog with humans in natural, conversational language about visual content. Specifically, given an image, a dialog history, and a follow-up question about the image, the task is to answer the question.
Papers
Showing 1–10 of 118 papers
All datasetsVisual Dialog v1.0 test-stdVisDial v0.9 valVisDial v1.0 test-stdBlendedSkillTalkConvAI2EmpatheticDialoguesImage-ChatWizard of Wikipedia
Benchmark Results
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | Single | NDCG (x 100) | 78.7 | — | Unverified |
| 2 | P1P2+Distill+Ensemble | NDCG (x 100) | 77.92 | — | Unverified |
| 3 | Ensemble + Fine-tuning | NDCG (x 100) | 76.43 | — | Unverified |
| 4 | ensemble, finetune | NDCG (x 100) | 76.17 | — | Unverified |
| 5 | VD-PCR | NDCG (x 100) | 76.14 | — | Unverified |
| 6 | Ensemble | NDCG (x 100) | 75.35 | — | Unverified |
| 7 | Ensemble + Finetune | NDCG (x 100) | 74.88 | — | Unverified |
| 8 | bert-double-stream-finetuning | NDCG (x 100) | 74.62 | — | Unverified |
| 9 | CE-finetuned, single model | NDCG (x 100) | 74.47 | — | Unverified |
| 10 | 2 | NDCG (x 100) | 73.36 | — | Unverified |