Visual Dialog
Visual Dialog requires an AI agent to hold a meaningful dialog with humans in natural, conversational language about visual content. Specifically, given an image, a dialog history, and a follow-up question about the image, the task is to answer the question.
Papers
Showing 1–10 of 118 papers
All datasetsVisual Dialog v1.0 test-stdVisDial v0.9 valVisDial v1.0 test-stdBlendedSkillTalkConvAI2EmpatheticDialoguesImage-ChatWizard of Wikipedia
Benchmark Results
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | 9xFGA (VGG) | MRR | 68.92 | — | Unverified |
| 2 | DAN | MRR | 66.38 | — | Unverified |
| 3 | CorefNMN (ResNet-152) | MRR | 64.1 | — | Unverified |
| 4 | CoAtt | MRR | 63.98 | — | Unverified |
| 5 | CorefNMN | MRR | 63.6 | — | Unverified |
| 6 | DualVD | MRR | 62.94 | — | Unverified |
| 7 | SF-QIH-se-2 | MRR | 62.42 | — | Unverified |
| 8 | HCIAE-NP-ATT | MRR | 62.22 | — | Unverified |
| 9 | HieCoAtt-QI | MRR | 57.88 | — | Unverified |
| 10 | AMEM | R@1 | 48.53 | — | Unverified |