SOTAVerified

Visual Dialog

Visual Dialog requires an AI agent to hold a meaningful dialog with humans in natural, conversational language about visual content. Specifically, given an image, a dialog history, and a follow-up question about the image, the task is to answer the question.

Papers

Showing 110 of 118 papers

TitleStatusHype
V^2Dial: Unification of Video and Visual Dialog via Multimodal Experts0
V^2Dial: Unification of Video and Visual Dialog via Multimodal Experts0
Enhancing Visual Dialog State Tracking through Iterative Object-Entity Alignment in Multi-Round Conversations0
ICCV23 Visual-Dialog Emotion Explanation Challenge: SEU_309 Team Technical Report0
Hawk: Learning to Understand Open-World Video AnomaliesCode3
Mini-Gemini: Mining the Potential of Multi-modality Vision Language ModelsCode7
FlexCap: Describe Anything in Images in Controllable Detail0
VD-GR: Boosting Visual Dialog with Cascaded Spatial-Temporal Multi-Modal GRaphs0
Collecting Visually-Grounded Dialogue with A Game Of SortsCode0
Affective Visual Dialog: A Large-Scale Benchmark for Emotional Reasoning Based on Visually Grounded Conversations0
Show:102550
← PrevPage 1 of 12Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Multi-Modal BlenderBotBLEU-41.1Unverified