| Attribute Diversity Determines the Systematicity Gap in VQA | Nov 15, 2023 | AttributeDiagnostic | CodeCode Available | 0 |
| Asking More Informative Questions for Grounded Retrieval | Nov 14, 2023 | Question AnsweringQuestion Selection | —Unverified | 0 |
| What Large Language Models Bring to Text-rich VQA? | Nov 13, 2023 | Image ComprehensionOptical Character Recognition (OCR) | —Unverified | 0 |
| Visual Commonsense based Heterogeneous Graph Contrastive Learning | Nov 11, 2023 | Contrastive LearningQuestion Answering | —Unverified | 0 |
| Zero-shot Translation of Attention Patterns in VQA Models to Natural Language | Nov 8, 2023 | Image CaptioningLanguage Modeling | CodeCode Available | 0 |
| CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding | Nov 6, 2023 | CoLAQuestion Answering | —Unverified | 0 |
| From Image to Language: A Critical Analysis of Visual Question Answering (VQA) Approaches, Challenges, and Opportunities | Nov 1, 2023 | NavigateQuestion Answering | —Unverified | 0 |
| VQA-GEN: A Visual Question Answering Benchmark for Domain Generalization | Nov 1, 2023 | Domain GeneralizationQuestion Answering | —Unverified | 0 |
| A Systematic Evaluation of GPT-4V's Multimodal Capability for Medical Image Analysis | Oct 31, 2023 | DescriptiveMedical Image Analysis | —Unverified | 0 |
| Learning to Follow Object-Centric Image Editing Instructions Faithfully | Oct 29, 2023 | ObjectQuestion Answering | CodeCode Available | 0 |
| Dynamic Task and Weight Prioritization Curriculum Learning for Multimodal Imagery | Oct 29, 2023 | Deep LearningMultimodal Deep Learning | CodeCode Available | 0 |
| ViCLEVR: A Visual Reasoning Dataset and Hybrid Multimodal Fusion Model for Visual Question Answering in Vietnamese | Oct 27, 2023 | Information RetrievalNatural Language Queries | CodeCode Available | 0 |
| Davidsonian Scene Graph: Improving Reliability in Fine-grained Evaluation for Text-to-Image Generation | Oct 27, 2023 | Image GenerationQuestion Answering | —Unverified | 0 |
| Incorporating Probing Signals into Multimodal Machine Translation via Visual Question-Answering Pairs | Oct 26, 2023 | AttributeMachine Translation | CodeCode Available | 0 |
| CAD -- Contextual Multi-modal Alignment for Dynamic AVQA | Oct 25, 2023 | Audio-visual Question AnsweringAudio-Visual Question Answering (AVQA) | —Unverified | 0 |
| Exploring Question Decomposition for Zero-Shot VQA | Oct 25, 2023 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Enhancing Document Information Analysis with Multi-Task Pre-training: A Robust Approach for Information Extraction in Visually-Rich Documents | Oct 25, 2023 | AllDocument Classification | —Unverified | 0 |
| Multimodal Representations for Teacher-Guided Compositional Visual Reasoning | Oct 24, 2023 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Dataset Bias Mitigation in Multiple-Choice Visual Question Answering and Beyond | Oct 23, 2023 | counterfactualMultiple-choice | —Unverified | 0 |
| LXMERT Model Compression for Visual Question Answering | Oct 23, 2023 | modelModel Compression | CodeCode Available | 0 |
| SILC: Improving Vision Language Pretraining with Self-Distillation | Oct 20, 2023 | ClassificationContrastive Learning | —Unverified | 0 |
| A Simple Baseline for Knowledge-Based Visual Question Answering | Oct 20, 2023 | In-Context LearningQuestion Answering | CodeCode Available | 0 |
| RSAdapter: Adapting Multimodal Models for Remote Sensing Visual Question Answering | Oct 19, 2023 | Image CaptioningQuestion Answering | CodeCode Available | 0 |
| UNK-VQA: A Dataset and a Probe into the Abstention Ability of Multi-modal Large Models | Oct 17, 2023 | AttributeQuestion Answering | CodeCode Available | 0 |
| Exploring Sparse Spatial Relation in Graph Inference for Text-Based VQA | Oct 13, 2023 | Graph LearningObject | —Unverified | 0 |