SOTAVerified

Visual Dialog

Visual Dialog requires an AI agent to hold a meaningful dialog with humans in natural, conversational language about visual content. Specifically, given an image, a dialog history, and a follow-up question about the image, the task is to answer the question.

Papers

Showing 51100 of 118 papers

TitleStatusHype
Effective questions in referential visual dialogue0
ORD: Object Relationship Discovery for Visual Dialogue Generation0
History for Visual Dialog: Do we really need it?Code1
Multi-View Attention Network for Visual DialogCode1
VD-BERT: A Unified Vision and Dialog Transformer with BERTCode1
Reasoning Visual Dialog with Sparse Graph Learning and Knowledge TransferCode1
Iterative Context-Aware Graph Inference for Visual DialogCode1
Modality-Balanced Models for Visual Dialogue0
Ensemble based discriminative models for Visual Dialog Challenge 20180
Vision and Language: from Visual Perception to Content Creation0
DMRM: A Dual-channel Multi-hop Reasoning Model for Visual DialogCode0
Large-scale Pretraining for Visual Dialog: A Simple State-of-the-Art BaselineCode1
TAB-VCR: Tags and Attributes based VCR BaselinesCode0
Efficient Attention Mechanism for Visual Dialog that can Handle All the Interactions between Multiple InputsCode0
Two Causal Principles for Improving Visual DialogCode0
An Annotated Corpus of Reference Resolution for Interpreting Common GroundingCode1
DualVD: An Adaptive Dual Encoding Model for Deep Visual Understanding in Visual DialogueCode0
Visual Dialogue State Tracking for Question GenerationCode1
Video Dialog via Progressive Inference and Cross-Transformer0
TAB-VCR: Tags and Attributes based Visual Commonsense Reasoning BaselinesCode0
Granular Multimodal Attention Networks for Visual Dialog0
Improving Generative Visual Dialog by Answering Diverse QuestionsCode0
On Controlled DeEntanglement for Natural Language Processing0
Probabilistic framework for solving Visual Dialog0
Building Task-Oriented Visual Dialog Systems Through Alternative Optimization Between Dialog Policy and Language Generation0
Grounded Agreement Games: Emphasizing Conversational Grounding in Visual Dialogue Settings0
Reactive Multi-Stage Feature Fusion for Multimodal Dialogue Modeling0
What Should I Ask? Using Conversationally Informative Rewards for Goal-Oriented Visual Dialog0
Learning Goal-Oriented Visual Dialog Agents: Imitating and Surpassing Analytic Experts0
What Should I Ask? Using Conversationally Informative Rewards for Goal-oriented Visual Dialog.0
The World in My Mind: Visual Dialog with Adversarial Multi-modal Feature Encoding0
A Generative Adversarial Density Estimator0
Factor Graph AttentionCode0
Reasoning Visual Dialogs with Structural and Partial ObservationsCode0
CLEVR-Dialog: A Diagnostic Dataset for Multi-Round Reasoning in Visual DialogCode0
Discourse Parsing in Videos: A Multi-modal AppraochCode0
Generative Visual Dialogue System via Adaptive Reasoning and Weighted Likelihood Estimation0
Image-Question-Answer Synergistic Network for Visual Dialog0
Making History Matter: History-Advantage Sequence Training for Visual Dialog0
Dual Attention Networks for Visual Reference Resolution in Visual DialogCode0
Large-Scale Answerer in Questioner's Mind for Visual Dialog Question GenerationCode1
Multi-step Reasoning via Recurrent Dual Attention for Visual Dialog0
Response to "Visual Dialogue without Vision or Dialogue" (Massiceti et al., 2018)0
Visual Dialogue without Vision or DialogueCode0
Gold Seeker: Information Gain from Policy Distributions for Goal-oriented Vision-and-Langauge Reasoning0
What's to know? Uncertainty as a Guide to Asking Goal-oriented Questions0
PIRC Net : Using Proposal Indexing, Relationships and Context for Phrase Grounding0
Recursive Visual Attention in Visual DialogCode0
Visual Coreference Resolution in Visual Dialog using Neural Module NetworksCode0
Visual Reasoning with Multi-hop Feature ModulationCode0
Show:102550
← PrevPage 2 of 3Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1SingleNDCG (x 100)78.7Unverified
2P1P2+Distill+EnsembleNDCG (x 100)77.92Unverified
3Ensemble + Fine-tuningNDCG (x 100)76.43Unverified
4ensemble, finetuneNDCG (x 100)76.17Unverified
5VD-PCRNDCG (x 100)76.14Unverified
6EnsembleNDCG (x 100)75.35Unverified
7Ensemble + FinetuneNDCG (x 100)74.88Unverified
8bert-double-stream-finetuningNDCG (x 100)74.62Unverified
9CE-finetuned, single modelNDCG (x 100)74.47Unverified
102NDCG (x 100)73.36Unverified
#ModelMetricClaimedVerifiedStatus
19xFGA (VGG)MRR68.92Unverified
2DANMRR66.38Unverified
3CorefNMN (ResNet-152)MRR64.1Unverified
4CoAttMRR63.98Unverified
5CorefNMNMRR63.6Unverified
6DualVDMRR62.94Unverified
7SF-QIH-se-2MRR62.42Unverified
8HCIAE-NP-ATTMRR62.22Unverified
9HieCoAtt-QIMRR57.88Unverified
10AMEMR@148.53Unverified
#ModelMetricClaimedVerifiedStatus
15xFGA + LSNDCG64.04Unverified
25xFGA + LS*+MRR0.71Unverified
3Two-StepMRR0.7Unverified
#ModelMetricClaimedVerifiedStatus
1Multi-Modal BlenderBotBLEU-41Unverified
#ModelMetricClaimedVerifiedStatus
1Multi-Modal BlenderBotBLEU-41.1Unverified
#ModelMetricClaimedVerifiedStatus
1Multi-Modal BlenderBotBLEU-41.5Unverified
#ModelMetricClaimedVerifiedStatus
1Multi-Modal BlenderBotBLEU-440Unverified
#ModelMetricClaimedVerifiedStatus
1Multi-Modal BlenderBotBLEU-42.2Unverified