SOTAVerified

Visual Dialog

Visual Dialog requires an AI agent to hold a meaningful dialog with humans in natural, conversational language about visual content. Specifically, given an image, a dialog history, and a follow-up question about the image, the task is to answer the question.

Papers

Showing 150 of 118 papers

TitleStatusHype
V^2Dial: Unification of Video and Visual Dialog via Multimodal Experts0
V^2Dial: Unification of Video and Visual Dialog via Multimodal Experts0
Enhancing Visual Dialog State Tracking through Iterative Object-Entity Alignment in Multi-Round Conversations0
ICCV23 Visual-Dialog Emotion Explanation Challenge: SEU_309 Team Technical Report0
Hawk: Learning to Understand Open-World Video AnomaliesCode3
Mini-Gemini: Mining the Potential of Multi-modality Vision Language ModelsCode7
FlexCap: Describe Anything in Images in Controllable Detail0
VD-GR: Boosting Visual Dialog with Cascaded Spatial-Temporal Multi-Modal GRaphs0
Collecting Visually-Grounded Dialogue with A Game Of SortsCode0
Affective Visual Dialog: A Large-Scale Benchmark for Emotional Reasoning Based on Visually Grounded Conversations0
PaCE: Unified Multi-modal Dialogue Pre-training with Progressive and Compositional ExpertsCode0
Unified Multimodal Model with Unlikelihood Training for Visual DialogCode1
A survey on knowledge-enhanced multimodal learning0
Knowledge Transfer with Visual Prompt in multi-modal Dialogue Understanding and Generation0
LAVIS: A Library for Language-Vision IntelligenceCode0
Video Dialog as Conversation about Objects Living in Space-TimeCode1
Adversarial Robustness of Visual Dialog0
ENRICH4ALL: A First Luxembourgish BERT Model for a Multilingual Chatbot0
VD-PCR: Improving Visual Dialog with Pronoun Coreference ResolutionCode1
The Dialog Must Go On: Improving Visual Dialog via Generative Self-TrainingCode1
UTC: A Unified Transformer with Inter-Task Contrastive Learning for Visual Dialog0
Improving Cross-Modal Understanding in Visual Dialog via Contrastive Learning0
Reasoning with Multi-Structure Commonsense Knowledge in Visual Dialog0
Spot the Difference: A Cooperative Object-Referring Game in Non-Perfectly Co-Observable SceneCode0
Modeling Coreference Relations in Visual Dialog0
VU-BERT: A Unified framework for Visual Dialog0
Discourse Analysis for Evaluating Coherence in Video Paragraph Captions0
How to Fool Systems and Humans in Visually Grounded Interaction: A Case Study on Adversarial Attacks on Visual Dialog0
UNITER-Based Situated Coreference Resolution with Rich Multimodal InputCode0
Region under Discussion for visual dialog0
Enriching Language Models with Visually-grounded Word Vectors and the Lancaster Sensorimotor Norms0
Perceptual Score: What Data Modalities Does Your Model Perceive?Code0
ViDA-MAN: Visual Dialog with Digital Humans0
Evaluating and Improving Interactions with Hazy Oracles0
The Impact of Answers in Referential Visual Dialog0
Variational Disentangled Attention for Regularized Visual Dialog0
GoG: Relation-aware Graph-over-Graph Network for Visual Dialog0
Learning to Ground Visual Objects for Visual Dialog0
Enhancing Visual Dialog Questioner with Entity-based Strategy Learning and Augmented GuesserCode0
SeqDialN: Sequential Visual Dialog Network in Joint Visual-Linguistic Representation SpaceCode0
Learning Better Visual Dialog Agents with Pretrained Visual-Linguistic RepresentationCode0
Ensemble of MRR and NDCG models for Visual DialogCode1
Visual-Textual Alignment for Graph Inference in Visual Dialog0
Where Are You? Localization from Embodied DialogCode1
Reasoning Over History: Context Aware Visual Dialog0
Multi-Modal Open-Domain Dialogue0
Answer-Driven Visual State Estimator for Goal-Oriented Visual DialogueCode0
SeqDialN: Sequential Visual Dialog Networks in Joint Visual-Linguistic Representation SpaceCode0
Dialog without Dialog Data: Learning Visual Dialog Agents from VQA DataCode0
Effective questions in referential visual dialogue0
Show:102550
← PrevPage 1 of 3Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1SingleNDCG (x 100)78.7Unverified
2P1P2+Distill+EnsembleNDCG (x 100)77.92Unverified
3Ensemble + Fine-tuningNDCG (x 100)76.43Unverified
4ensemble, finetuneNDCG (x 100)76.17Unverified
5VD-PCRNDCG (x 100)76.14Unverified
6EnsembleNDCG (x 100)75.35Unverified
7Ensemble + FinetuneNDCG (x 100)74.88Unverified
8bert-double-stream-finetuningNDCG (x 100)74.62Unverified
9CE-finetuned, single modelNDCG (x 100)74.47Unverified
102NDCG (x 100)73.36Unverified
#ModelMetricClaimedVerifiedStatus
19xFGA (VGG)MRR68.92Unverified
2DANMRR66.38Unverified
3CorefNMN (ResNet-152)MRR64.1Unverified
4CoAttMRR63.98Unverified
5CorefNMNMRR63.6Unverified
6DualVDMRR62.94Unverified
7SF-QIH-se-2MRR62.42Unverified
8HCIAE-NP-ATTMRR62.22Unverified
9HieCoAtt-QIMRR57.88Unverified
10AMEMR@148.53Unverified
#ModelMetricClaimedVerifiedStatus
15xFGA + LSNDCG64.04Unverified
25xFGA + LS*+MRR0.71Unverified
3Two-StepMRR0.7Unverified
#ModelMetricClaimedVerifiedStatus
1Multi-Modal BlenderBotBLEU-41Unverified
#ModelMetricClaimedVerifiedStatus
1Multi-Modal BlenderBotBLEU-41.1Unverified
#ModelMetricClaimedVerifiedStatus
1Multi-Modal BlenderBotBLEU-41.5Unverified
#ModelMetricClaimedVerifiedStatus
1Multi-Modal BlenderBotBLEU-440Unverified
#ModelMetricClaimedVerifiedStatus
1Multi-Modal BlenderBotBLEU-42.2Unverified