Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models Mar 27, 2024 Image Classification Image Comprehension
Code Code Available 7Hawk: Learning to Understand Open-World Video Anomalies May 27, 2024 Anomaly Detection Question Answering
Code Code Available 3Unified Multimodal Model with Unlikelihood Training for Visual Dialog Nov 23, 2022 Answer Generation Chatbot
Code Code Available 1Video Dialog as Conversation about Objects Living in Space-Time Jul 8, 2022 Object Relational Reasoning
Code Code Available 1VD-PCR: Improving Visual Dialog with Pronoun Coreference Resolution May 29, 2022 AI Agent coreference-resolution
Code Code Available 1The Dialog Must Go On: Improving Visual Dialog via Generative Self-Training May 25, 2022 Conditional Text Generation Out-of-Distribution Detection
Code Code Available 1Ensemble of MRR and NDCG models for Visual Dialog Apr 15, 2021 AI Agent Visual Dialog
Code Code Available 1Where Are You? Localization from Embodied Dialog Nov 16, 2020 Navigate Visual Dialog
Code Code Available 1History for Visual Dialog: Do we really need it? May 8, 2020 Visual Dialog
Code Code Available 1Multi-View Attention Network for Visual Dialog Apr 29, 2020 Visual Dialog
Code Code Available 1VD-BERT: A Unified Vision and Dialog Transformer with BERT Apr 28, 2020 Answer Generation Visual Dialog
Code Code Available 1Reasoning Visual Dialog with Sparse Graph Learning and Knowledge Transfer Apr 14, 2020 Graph Learning Graph structure learning
Code Code Available 1Iterative Context-Aware Graph Inference for Visual Dialog Apr 5, 2020 Graph Attention Graph Embedding
Code Code Available 1Large-scale Pretraining for Visual Dialog: A Simple State-of-the-Art Baseline Dec 5, 2019 Language Modelling Representation Learning
Code Code Available 1An Annotated Corpus of Reference Resolution for Interpreting Common Grounding Nov 18, 2019 Coreference Resolution Goal-Oriented Dialog
Code Code Available 1Visual Dialogue State Tracking for Question Generation Nov 12, 2019 Dialogue State Tracking Question Generation
Code Code Available 1Large-Scale Answerer in Questioner's Mind for Visual Dialog Question Generation Feb 22, 2019 Question Generation Question-Generation
Code Code Available 1Audio Visual Scene-Aware Dialog (AVSD) Challenge at DSTC7 Jun 1, 2018 Video Description Visual Dialog
Code Code Available 1Answerer in Questioner's Mind: Information Theoretic Approach to Goal-Oriented Visual Dialog Feb 12, 2018 Goal-Oriented Dialog Reinforcement Learning
Code Code Available 1Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Mar 20, 2017 Deep Reinforcement Learning reinforcement-learning
Code Code Available 1Visual Dialog Nov 26, 2016 AI Agent Chatbot
Code Code Available 1Hierarchical Question-Image Co-Attention for Visual Question Answering May 31, 2016 Visual Dialog Visual Question Answering
Code Code Available 1V^2Dial: Unification of Video and Visual Dialog via Multimodal Experts Mar 3, 2025 Contrastive Learning Text Retrieval
— Unverified 0V^2Dial: Unification of Video and Visual Dialog via Multimodal Experts Jan 1, 2025 Contrastive Learning Text Retrieval
— Unverified 0Enhancing Visual Dialog State Tracking through Iterative Object-Entity Alignment in Multi-Round Conversations Aug 13, 2024 dialog state tracking Dialogue State Tracking
— Unverified 0ICCV23 Visual-Dialog Emotion Explanation Challenge: SEU_309 Team Technical Report Jul 13, 2024 Explanation Generation Language Modeling
— Unverified 0FlexCap: Describe Anything in Images in Controllable Detail Mar 18, 2024 Attribute Dense Captioning
— Unverified 0VD-GR: Boosting Visual Dialog with Cascaded Spatial-Temporal Multi-Modal GRaphs Oct 25, 2023 Visual Dialog
— Unverified 0Collecting Visually-Grounded Dialogue with A Game Of Sorts Sep 10, 2023 Coreference Resolution Image Retrieval
Code Code Available 0Affective Visual Dialog: A Large-Scale Benchmark for Emotional Reasoning Based on Visually Grounded Conversations Aug 30, 2023 Explanation Generation Question Answering
— Unverified 0PaCE: Unified Multi-modal Dialogue Pre-training with Progressive and Compositional Experts May 24, 2023 Dialogue State Tracking Image Retrieval
Code Code Available 0A survey on knowledge-enhanced multimodal learning Nov 19, 2022 Conditional Image Generation Factual Visual Question Answering
— Unverified 0Knowledge Transfer with Visual Prompt in multi-modal Dialogue Understanding and Generation Oct 1, 2022 Dialogue Understanding Knowledge Distillation
— Unverified 0LAVIS: A Library for Language-Vision Intelligence Sep 15, 2022 Benchmarking Image Captioning
Code Code Available 0Adversarial Robustness of Visual Dialog Jul 6, 2022 Adversarial Robustness Visual Dialog
— Unverified 0ENRICH4ALL: A First Luxembourgish BERT Model for a Multilingual Chatbot Jun 1, 2022 Chatbot Language Modeling
— Unverified 0UTC: A Unified Transformer with Inter-Task Contrastive Learning for Visual Dialog May 1, 2022 Contrastive Learning Representation Learning
— Unverified 0Improving Cross-Modal Understanding in Visual Dialog via Contrastive Learning Apr 15, 2022 Contrastive Learning Question Answering
— Unverified 0Reasoning with Multi-Structure Commonsense Knowledge in Visual Dialog Apr 10, 2022 Logical Reasoning Sentence
— Unverified 0Spot the Difference: A Cooperative Object-Referring Game in Non-Perfectly Co-Observable Scene Mar 16, 2022 Visual Dialog
Code Code Available 0Modeling Coreference Relations in Visual Dialog Mar 6, 2022 Question Answering Visual Dialog
— Unverified 0VU-BERT: A Unified framework for Visual Dialog Feb 22, 2022 Language Modeling Language Modelling
— Unverified 0Discourse Analysis for Evaluating Coherence in Video Paragraph Captions Jan 17, 2022 Video Captioning Visual Dialog
— Unverified 0How to Fool Systems and Humans in Visually Grounded Interaction: A Case Study on Adversarial Attacks on Visual Dialog Jan 16, 2022 Visual Dialog
— Unverified 0UNITER-Based Situated Coreference Resolution with Rich Multimodal Input Dec 7, 2021 coreference-resolution Coreference Resolution
Code Code Available 0Region under Discussion for visual dialog Nov 1, 2021 Visual Dialog
— Unverified 0Enriching Language Models with Visually-grounded Word Vectors and the Lancaster Sensorimotor Norms Nov 1, 2021 Visual Dialog
— Unverified 0Perceptual Score: What Data Modalities Does Your Model Perceive? Oct 27, 2021 Question Answering Visual Dialog
Code Code Available 0ViDA-MAN: Visual Dialog with Digital Humans Oct 26, 2021 speech-recognition Speech Recognition
— Unverified 0Evaluating and Improving Interactions with Hazy Oracles Oct 19, 2021 Object Tracking Referring Expression
— Unverified 0