SOTAVerified

Visual Dialog

Visual Dialog requires an AI agent to hold a meaningful dialog with humans in natural, conversational language about visual content. Specifically, given an image, a dialog history, and a follow-up question about the image, the task is to answer the question.

Papers

Showing 125 of 118 papers

TitleStatusHype
Mini-Gemini: Mining the Potential of Multi-modality Vision Language ModelsCode7
Hawk: Learning to Understand Open-World Video AnomaliesCode3
VD-PCR: Improving Visual Dialog with Pronoun Coreference ResolutionCode1
Unified Multimodal Model with Unlikelihood Training for Visual DialogCode1
Visual Dialogue State Tracking for Question GenerationCode1
Video Dialog as Conversation about Objects Living in Space-TimeCode1
Iterative Context-Aware Graph Inference for Visual DialogCode1
Ensemble of MRR and NDCG models for Visual DialogCode1
Learning Cooperative Visual Dialog Agents with Deep Reinforcement LearningCode1
The Dialog Must Go On: Improving Visual Dialog via Generative Self-TrainingCode1
Visual DialogCode1
Audio Visual Scene-Aware Dialog (AVSD) Challenge at DSTC7Code1
An Annotated Corpus of Reference Resolution for Interpreting Common GroundingCode1
Where Are You? Localization from Embodied DialogCode1
Hierarchical Question-Image Co-Attention for Visual Question AnsweringCode1
Large-Scale Answerer in Questioner's Mind for Visual Dialog Question GenerationCode1
Answerer in Questioner's Mind: Information Theoretic Approach to Goal-Oriented Visual DialogCode1
History for Visual Dialog: Do we really need it?Code1
Reasoning Visual Dialog with Sparse Graph Learning and Knowledge TransferCode1
Large-scale Pretraining for Visual Dialog: A Simple State-of-the-Art BaselineCode1
VD-BERT: A Unified Vision and Dialog Transformer with BERTCode1
Multi-View Attention Network for Visual DialogCode1
Building Task-Oriented Visual Dialog Systems Through Alternative Optimization Between Dialog Policy and Language Generation0
Adversarial Robustness of Visual Dialog0
Gold Seeker: Information Gain from Policy Distributions for Goal-oriented Vision-and-Langauge Reasoning0
Show:102550
← PrevPage 1 of 5Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1SingleNDCG (x 100)78.7Unverified
2P1P2+Distill+EnsembleNDCG (x 100)77.92Unverified
3Ensemble + Fine-tuningNDCG (x 100)76.43Unverified
4ensemble, finetuneNDCG (x 100)76.17Unverified
5VD-PCRNDCG (x 100)76.14Unverified
6EnsembleNDCG (x 100)75.35Unverified
7Ensemble + FinetuneNDCG (x 100)74.88Unverified
8bert-double-stream-finetuningNDCG (x 100)74.62Unverified
9CE-finetuned, single modelNDCG (x 100)74.47Unverified
102NDCG (x 100)73.36Unverified
#ModelMetricClaimedVerifiedStatus
19xFGA (VGG)MRR68.92Unverified
2DANMRR66.38Unverified
3CorefNMN (ResNet-152)MRR64.1Unverified
4CoAttMRR63.98Unverified
5CorefNMNMRR63.6Unverified
6DualVDMRR62.94Unverified
7SF-QIH-se-2MRR62.42Unverified
8HCIAE-NP-ATTMRR62.22Unverified
9HieCoAtt-QIMRR57.88Unverified
10AMEMR@148.53Unverified
#ModelMetricClaimedVerifiedStatus
15xFGA + LSNDCG64.04Unverified
25xFGA + LS*+MRR0.71Unverified
3Two-StepMRR0.7Unverified
#ModelMetricClaimedVerifiedStatus
1Multi-Modal BlenderBotBLEU-41Unverified
#ModelMetricClaimedVerifiedStatus
1Multi-Modal BlenderBotBLEU-41.1Unverified
#ModelMetricClaimedVerifiedStatus
1Multi-Modal BlenderBotBLEU-41.5Unverified
#ModelMetricClaimedVerifiedStatus
1Multi-Modal BlenderBotBLEU-440Unverified
#ModelMetricClaimedVerifiedStatus
1Multi-Modal BlenderBotBLEU-42.2Unverified