SOTAVerified

Visual Dialog

Visual Dialog requires an AI agent to hold a meaningful dialog with humans in natural, conversational language about visual content. Specifically, given an image, a dialog history, and a follow-up question about the image, the task is to answer the question.

Papers

Showing 5175 of 118 papers

TitleStatusHype
Multi-Modal Open-Domain Dialogue0
Multi-step Reasoning via Recurrent Dual Attention for Visual Dialog0
ORD: Object Relationship Discovery for Visual Dialogue Generation0
PIRC Net : Using Proposal Indexing, Relationships and Context for Phrase Grounding0
Probabilistic framework for solving Visual Dialog0
Pushing the Limits of Radiology with Joint Modeling of Visual and Textual Information0
Reactive Multi-Stage Feature Fusion for Multimodal Dialogue Modeling0
Reasoning Over History: Context Aware Visual Dialog0
Reasoning with Multi-Structure Commonsense Knowledge in Visual Dialog0
Region under Discussion for visual dialog0
Response to "Visual Dialogue without Vision or Dialogue" (Massiceti et al., 2018)0
The Impact of Answers in Referential Visual Dialog0
The World in My Mind: Visual Dialog with Adversarial Multi-modal Feature Encoding0
Towards Visual Dialog for Radiology0
Two can play this Game: Visual Dialog with Discriminative Question Generation and Answering0
UTC: A Unified Transformer with Inter-Task Contrastive Learning for Visual Dialog0
V^2Dial: Unification of Video and Visual Dialog via Multimodal Experts0
V^2Dial: Unification of Video and Visual Dialog via Multimodal Experts0
Variational Disentangled Attention for Regularized Visual Dialog0
ViDA-MAN: Visual Dialog with Digital Humans0
On Controlled DeEntanglement for Natural Language Processing0
Vision and Language: from Visual Perception to Content Creation0
Visual Reference Resolution using Attention Memory for Visual Dialog0
Visual-Textual Alignment for Graph Inference in Visual Dialog0
VU-BERT: A Unified framework for Visual Dialog0
Show:102550
← PrevPage 3 of 5Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1SingleNDCG (x 100)78.7Unverified
2P1P2+Distill+EnsembleNDCG (x 100)77.92Unverified
3Ensemble + Fine-tuningNDCG (x 100)76.43Unverified
4ensemble, finetuneNDCG (x 100)76.17Unverified
5VD-PCRNDCG (x 100)76.14Unverified
6EnsembleNDCG (x 100)75.35Unverified
7Ensemble + FinetuneNDCG (x 100)74.88Unverified
8bert-double-stream-finetuningNDCG (x 100)74.62Unverified
9CE-finetuned, single modelNDCG (x 100)74.47Unverified
102NDCG (x 100)73.36Unverified
#ModelMetricClaimedVerifiedStatus
19xFGA (VGG)MRR68.92Unverified
2DANMRR66.38Unverified
3CorefNMN (ResNet-152)MRR64.1Unverified
4CoAttMRR63.98Unverified
5CorefNMNMRR63.6Unverified
6DualVDMRR62.94Unverified
7SF-QIH-se-2MRR62.42Unverified
8HCIAE-NP-ATTMRR62.22Unverified
9HieCoAtt-QIMRR57.88Unverified
10AMEMR@148.53Unverified
#ModelMetricClaimedVerifiedStatus
15xFGA + LSNDCG64.04Unverified
25xFGA + LS*+MRR0.71Unverified
3Two-StepMRR0.7Unverified
#ModelMetricClaimedVerifiedStatus
1Multi-Modal BlenderBotBLEU-41Unverified
#ModelMetricClaimedVerifiedStatus
1Multi-Modal BlenderBotBLEU-41.1Unverified
#ModelMetricClaimedVerifiedStatus
1Multi-Modal BlenderBotBLEU-41.5Unverified
#ModelMetricClaimedVerifiedStatus
1Multi-Modal BlenderBotBLEU-440Unverified
#ModelMetricClaimedVerifiedStatus
1Multi-Modal BlenderBotBLEU-42.2Unverified