SOTAVerified

Visual Dialog

Visual Dialog requires an AI agent to hold a meaningful dialog with humans in natural, conversational language about visual content. Specifically, given an image, a dialog history, and a follow-up question about the image, the task is to answer the question.

Papers

Showing 51100 of 118 papers

TitleStatusHype
TAB-VCR: Tags and Attributes based Visual Commonsense Reasoning BaselinesCode0
TAB-VCR: Tags and Attributes based VCR BaselinesCode0
Ask No More: Deciding when to guess in referential visual dialogueCode0
DMRM: A Dual-channel Multi-hop Reasoning Model for Visual DialogCode0
Dual Attention Networks for Visual Reference Resolution in Visual DialogCode0
DualVD: An Adaptive Dual Encoding Model for Deep Visual Understanding in Visual DialogueCode0
Visual Reference Resolution using Attention Memory for Visual Dialog0
Visual-Textual Alignment for Graph Inference in Visual Dialog0
VU-BERT: A Unified framework for Visual Dialog0
What Should I Ask? Using Conversationally Informative Rewards for Goal-oriented Visual Dialog.0
What Should I Ask? Using Conversationally Informative Rewards for Goal-Oriented Visual Dialog0
What's to know? Uncertainty as a Guide to Asking Goal-oriented Questions0
Video Dialog via Progressive Inference and Cross-Transformer0
Adversarial Robustness of Visual Dialog0
Affective Visual Dialog: A Large-Scale Benchmark for Emotional Reasoning Based on Visually Grounded Conversations0
A Generative Adversarial Density Estimator0
Gold Seeker: Information Gain from Policy Distributions for Goal-oriented Vision-and-Langauge Reasoning0
Are You Talking to Me? Reasoned Visual Dialog Generation through Adversarial Learning0
A survey on knowledge-enhanced multimodal learning0
Building Task-Oriented Visual Dialog Systems Through Alternative Optimization Between Dialog Policy and Language Generation0
Evaluating and Improving Interactions with Hazy Oracles0
Connecting Language and Vision to Actions0
Discourse Analysis for Evaluating Coherence in Video Paragraph Captions0
Effective questions in referential visual dialogue0
Enhancing Visual Dialog State Tracking through Iterative Object-Entity Alignment in Multi-Round Conversations0
ENRICH4ALL: A First Luxembourgish BERT Model for a Multilingual Chatbot0
Enriching Language Models with Visually-grounded Word Vectors and the Lancaster Sensorimotor Norms0
Ensemble based discriminative models for Visual Dialog Challenge 20180
FlexCap: Describe Anything in Images in Controllable Detail0
FlipDial: A Generative Model for Two-Way Visual Dialogue0
Generative Visual Dialogue System via Adaptive Reasoning and Weighted Likelihood Estimation0
GoG: Relation-aware Graph-over-Graph Network for Visual Dialog0
Granular Multimodal Attention Networks for Visual Dialog0
Grounded Agreement Games: Emphasizing Conversational Grounding in Visual Dialogue Settings0
How to Fool Systems and Humans in Visually Grounded Interaction: A Case Study on Adversarial Attacks on Visual Dialog0
ICCV23 Visual-Dialog Emotion Explanation Challenge: SEU_309 Team Technical Report0
Image-Question-Answer Synergistic Network for Visual Dialog0
Improving Cross-Modal Understanding in Visual Dialog via Contrastive Learning0
Knowledge Transfer with Visual Prompt in multi-modal Dialogue Understanding and Generation0
Learning Goal-Oriented Visual Dialog Agents: Imitating and Surpassing Analytic Experts0
Learning to Ground Visual Objects for Visual Dialog0
Making History Matter: History-Advantage Sequence Training for Visual Dialog0
VD-GR: Boosting Visual Dialog with Cascaded Spatial-Temporal Multi-Modal GRaphs0
Modality-Balanced Models for Visual Dialogue0
Modeling Coreference Relations in Visual Dialog0
Multimodal Hierarchical Reinforcement Learning Policy for Task-Oriented Visual Dialog0
Multi-Modal Open-Domain Dialogue0
Multi-step Reasoning via Recurrent Dual Attention for Visual Dialog0
ORD: Object Relationship Discovery for Visual Dialogue Generation0
PIRC Net : Using Proposal Indexing, Relationships and Context for Phrase Grounding0
Show:102550
← PrevPage 2 of 3Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1SingleNDCG (x 100)78.7Unverified
2P1P2+Distill+EnsembleNDCG (x 100)77.92Unverified
3Ensemble + Fine-tuningNDCG (x 100)76.43Unverified
4ensemble, finetuneNDCG (x 100)76.17Unverified
5VD-PCRNDCG (x 100)76.14Unverified
6EnsembleNDCG (x 100)75.35Unverified
7Ensemble + FinetuneNDCG (x 100)74.88Unverified
8bert-double-stream-finetuningNDCG (x 100)74.62Unverified
9CE-finetuned, single modelNDCG (x 100)74.47Unverified
102NDCG (x 100)73.36Unverified
#ModelMetricClaimedVerifiedStatus
19xFGA (VGG)MRR68.92Unverified
2DANMRR66.38Unverified
3CorefNMN (ResNet-152)MRR64.1Unverified
4CoAttMRR63.98Unverified
5CorefNMNMRR63.6Unverified
6DualVDMRR62.94Unverified
7SF-QIH-se-2MRR62.42Unverified
8HCIAE-NP-ATTMRR62.22Unverified
9HieCoAtt-QIMRR57.88Unverified
10AMEMR@148.53Unverified
#ModelMetricClaimedVerifiedStatus
15xFGA + LSNDCG64.04Unverified
25xFGA + LS*+MRR0.71Unverified
3Two-StepMRR0.7Unverified
#ModelMetricClaimedVerifiedStatus
1Multi-Modal BlenderBotBLEU-41Unverified
#ModelMetricClaimedVerifiedStatus
1Multi-Modal BlenderBotBLEU-41.1Unverified
#ModelMetricClaimedVerifiedStatus
1Multi-Modal BlenderBotBLEU-41.5Unverified
#ModelMetricClaimedVerifiedStatus
1Multi-Modal BlenderBotBLEU-440Unverified
#ModelMetricClaimedVerifiedStatus
1Multi-Modal BlenderBotBLEU-42.2Unverified