SOTAVerified

Visual Dialog

Visual Dialog requires an AI agent to hold a meaningful dialog with humans in natural, conversational language about visual content. Specifically, given an image, a dialog history, and a follow-up question about the image, the task is to answer the question.

Papers

Showing 125 of 118 papers

TitleStatusHype
Mini-Gemini: Mining the Potential of Multi-modality Vision Language ModelsCode7
Hawk: Learning to Understand Open-World Video AnomaliesCode3
Unified Multimodal Model with Unlikelihood Training for Visual DialogCode1
Video Dialog as Conversation about Objects Living in Space-TimeCode1
VD-PCR: Improving Visual Dialog with Pronoun Coreference ResolutionCode1
The Dialog Must Go On: Improving Visual Dialog via Generative Self-TrainingCode1
Ensemble of MRR and NDCG models for Visual DialogCode1
Where Are You? Localization from Embodied DialogCode1
History for Visual Dialog: Do we really need it?Code1
Multi-View Attention Network for Visual DialogCode1
VD-BERT: A Unified Vision and Dialog Transformer with BERTCode1
Reasoning Visual Dialog with Sparse Graph Learning and Knowledge TransferCode1
Iterative Context-Aware Graph Inference for Visual DialogCode1
Large-scale Pretraining for Visual Dialog: A Simple State-of-the-Art BaselineCode1
An Annotated Corpus of Reference Resolution for Interpreting Common GroundingCode1
Visual Dialogue State Tracking for Question GenerationCode1
Large-Scale Answerer in Questioner's Mind for Visual Dialog Question GenerationCode1
Audio Visual Scene-Aware Dialog (AVSD) Challenge at DSTC7Code1
Answerer in Questioner's Mind: Information Theoretic Approach to Goal-Oriented Visual DialogCode1
Learning Cooperative Visual Dialog Agents with Deep Reinforcement LearningCode1
Visual DialogCode1
Hierarchical Question-Image Co-Attention for Visual Question AnsweringCode1
V^2Dial: Unification of Video and Visual Dialog via Multimodal Experts0
V^2Dial: Unification of Video and Visual Dialog via Multimodal Experts0
Enhancing Visual Dialog State Tracking through Iterative Object-Entity Alignment in Multi-Round Conversations0
Show:102550
← PrevPage 1 of 5Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1SingleNDCG (x 100)78.7Unverified
2P1P2+Distill+EnsembleNDCG (x 100)77.92Unverified
3Ensemble + Fine-tuningNDCG (x 100)76.43Unverified
4ensemble, finetuneNDCG (x 100)76.17Unverified
5VD-PCRNDCG (x 100)76.14Unverified
6EnsembleNDCG (x 100)75.35Unverified
7Ensemble + FinetuneNDCG (x 100)74.88Unverified
8bert-double-stream-finetuningNDCG (x 100)74.62Unverified
9CE-finetuned, single modelNDCG (x 100)74.47Unverified
102NDCG (x 100)73.36Unverified
#ModelMetricClaimedVerifiedStatus
19xFGA (VGG)MRR68.92Unverified
2DANMRR66.38Unverified
3CorefNMN (ResNet-152)MRR64.1Unverified
4CoAttMRR63.98Unverified
5CorefNMNMRR63.6Unverified
6DualVDMRR62.94Unverified
7SF-QIH-se-2MRR62.42Unverified
8HCIAE-NP-ATTMRR62.22Unverified
9HieCoAtt-QIMRR57.88Unverified
10AMEMR@148.53Unverified
#ModelMetricClaimedVerifiedStatus
15xFGA + LSNDCG64.04Unverified
25xFGA + LS*+MRR0.71Unverified
3Two-StepMRR0.7Unverified
#ModelMetricClaimedVerifiedStatus
1Multi-Modal BlenderBotBLEU-41Unverified
#ModelMetricClaimedVerifiedStatus
1Multi-Modal BlenderBotBLEU-41.1Unverified
#ModelMetricClaimedVerifiedStatus
1Multi-Modal BlenderBotBLEU-41.5Unverified
#ModelMetricClaimedVerifiedStatus
1Multi-Modal BlenderBotBLEU-440Unverified
#ModelMetricClaimedVerifiedStatus
1Multi-Modal BlenderBotBLEU-42.2Unverified