| On the Promises and Challenges of Multimodal Foundation Models for Geographical, Environmental, Agricultural, and Urban Planning Applications | Dec 23, 2023 | geo-localizationimage-classification | —Unverified | 0 |
| On the Significance of Question Encoder Sequence Model in the Out-of-Distribution Performance in Visual Question Answering | Aug 28, 2021 | Graph AttentionQuestion Answering | —Unverified | 0 |
| On the Value of Out-of-Distribution Testing: An Example of Goodhart's Law | May 19, 2020 | Model SelectionQuestion Answering | —Unverified | 0 |
| Open-Ended Visual Question Answering by Multi-Modal Domain Adaptation | Nov 11, 2019 | Domain AdaptationQuestion Answering | —Unverified | 0 |
| Optimizing Explanations by Network Canonization and Hyperparameter Search | Nov 30, 2022 | Explainable Artificial Intelligence (XAI)image-classification | —Unverified | 0 |
| Optimizing Visual Question Answering Models for Driving: Bridging the Gap Between Human and Machine Attention Patterns | Jun 13, 2024 | Autonomous DrivingQuestion Answering | —Unverified | 0 |
| Optimus: Accelerating Large-Scale Multi-Modal LLM Training by Bubble Exploitation | Aug 7, 2024 | GPUQuestion Answering | —Unverified | 0 |
| Order Matters: Exploring Order Sensitivity in Multimodal Large Language Models | Oct 22, 2024 | In-Context LearningQuestion Answering | —Unverified | 0 |
| ORD: Object Relationship Discovery for Visual Dialogue Generation | Jun 15, 2020 | Dialogue GenerationGraph Attention | —Unverified | 0 |
| ORION: A Holistic End-to-End Autonomous Driving Framework by Vision-Language Instructed Action Generation | Mar 25, 2025 | Action GenerationAutonomous Driving | —Unverified | 0 |
| Out of the Box: Reasoning with Graph Convolution Nets for Factual Visual Question Answering | Nov 1, 2018 | Factual Visual Question AnsweringGeneral Knowledge | —Unverified | 0 |
| Overcoming Language Bias in Remote Sensing Visual Question Answering via Adversarial Training | Jun 1, 2023 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Overcoming Language Priors for Visual Question Answering Based on Knowledge Distillation | Jan 10, 2025 | Knowledge DistillationQuestion Answering | —Unverified | 0 |
| Overcoming Language Priors in Visual Question Answering with Adversarial Regularization | Oct 8, 2018 | Question AnsweringVisual Grounding | —Unverified | 0 |
| Overview of TREC 2024 Medical Video Question Answering (MedVidQA) Track | Dec 15, 2024 | Image CaptioningMedical Question Answering | —Unverified | 0 |
| NAAQA: A Neural Architecture for Acoustic Question Answering | Jun 11, 2021 | Acoustic Question AnsweringQuestion Answering | CodeCode Available | 0 |
| Weakly Supervised Relative Spatial Reasoning for Visual Question Answering | Sep 4, 2021 | Question AnsweringSpatial Reasoning | CodeCode Available | 0 |
| CROPE: Evaluating In-Context Adaptation of Vision and Language Models to Culture-Specific Concepts | Oct 20, 2024 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| MUTAN: Multimodal Tucker Fusion for Visual Question Answering | May 18, 2017 | Visual Question AnsweringVisual Question Answering (VQA) | CodeCode Available | 0 |
| Focal Visual-Text Attention for Memex Question Answering | Dec 14, 2018 | Memex Question AnsweringQuestion Answering | CodeCode Available | 0 |
| FM2DS: Few-Shot Multimodal Multihop Data Synthesis with Knowledge Distillation for Question Answering | Dec 9, 2024 | Knowledge DistillationQuestion Answering | CodeCode Available | 0 |
| Answer Questions with Right Image Regions: A Visual Attention Regularization Approach | Feb 3, 2021 | Question AnsweringVisual Grounding | CodeCode Available | 0 |
| NeSyCoCo: A Neuro-Symbolic Concept Composer for Compositional Generalization | Dec 20, 2024 | Compositional Generalization (AVG)Novel Concepts | CodeCode Available | 0 |
| Unlearning Sensitive Information in Multimodal LLMs: Benchmark and Attack-Defense Evaluation | May 1, 2025 | Question AnsweringSpecificity | CodeCode Available | 0 |
| X-GGM: Graph Generative Modeling for Out-of-Distribution Generalization in Visual Question Answering | Jul 24, 2021 | AttributeOut-of-Distribution Generalization | CodeCode Available | 0 |
| Neural Module Networks | Nov 9, 2015 | Visual Question AnsweringVisual Question Answering (VQA) | CodeCode Available | 0 |
| Unleashing the Potentials of Likelihood Composition for Multi-modal Language Models | Oct 1, 2024 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding | Oct 4, 2018 | Question AnsweringRepresentation Learning | CodeCode Available | 0 |
| Answering Questions about Data Visualizations using Efficient Bimodal Fusion | Aug 5, 2019 | Chart Question AnsweringOptical Character Recognition | CodeCode Available | 0 |
| Structured Attentions for Visual Question Answering | Aug 7, 2017 | Visual Question AnsweringVisual Question Answering (VQA) | CodeCode Available | 0 |
| Structured Triplet Learning with POS-tag Guided Attention for Visual Question Answering | Jan 24, 2018 | Multiple-choicePOS | CodeCode Available | 0 |
| What Can Neural Networks Reason About? | May 30, 2019 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| Counting Everyday Objects in Everyday Scenes | Apr 12, 2016 | ObjectObject Counting | CodeCode Available | 0 |
| AdCare-VLM: Leveraging Large Vision Language Model (LVLM) to Monitor Long-Term Medication Adherence and Care | May 1, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| Visual Reasoning with Multi-hop Feature Modulation | Aug 3, 2018 | Question AnsweringVisual Dialog | CodeCode Available | 0 |
| Filling the Image Information Gap for VQA: Prompting Large Language Models to Proactively Ask Questions | Nov 20, 2023 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| No Images, No Problem: Retaining Knowledge in Continual VQA with Questions-Only Memory | Feb 6, 2025 | Continual LearningQuestion Answering | CodeCode Available | 0 |
| Noise Estimation Using Density Estimation for Self-Supervised Multimodal Learning | Mar 6, 2020 | Density EstimationNoise Estimation | CodeCode Available | 0 |
| Unveiling Uncertainty: A Deep Dive into Calibration and Performance of Multimodal Large Language Models | Dec 19, 2024 | Autonomous DrivingImage Captioning | CodeCode Available | 0 |
| SURE-VQA: Systematic Understanding of Robustness Evaluation in Medical VQA Tasks | Nov 29, 2024 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| Few-Shot Multimodal Explanation for Visual Question Answering | Oct 28, 2024 | Explainable artificial intelligenceExplainable Artificial Intelligence (XAI) | CodeCode Available | 0 |
| Music's Multimodal Complexity in AVQA: Why We Need More than General Multimodal LLMs | May 27, 2025 | Audio-visual Question AnsweringQuestion Answering | CodeCode Available | 0 |
| Zero-shot Commonsense Reasoning over Machine Imagination | Oct 12, 2024 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| MUREL: Multimodal Relational Reasoning for Visual Question Answering | Feb 25, 2019 | Relational ReasoningVisual Question Answering | CodeCode Available | 0 |
| Multi-Sourced Compositional Generalization in Visual Question Answering | May 29, 2025 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| Multiple interaction learning with question-type prior knowledge for constraining answer search space in visual question answering | Sep 23, 2020 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| Object Attribute Matters in Visual Question Answering | Dec 20, 2023 | AttributeGraph Neural Network | CodeCode Available | 0 |
| Object-aware Adaptive-Positivity Learning for Audio-Visual Question Answering | Dec 20, 2023 | Audio-visual Question AnsweringAudio-Visual Question Answering (AVQA) | CodeCode Available | 0 |
| What is Right for Me is Not Yet Right for You: A Dataset for Grounding Relative Directions via Multi-Task Learning | May 5, 2022 | Multi-Task LearningQuestion Answering | CodeCode Available | 0 |
| Visual Robustness Benchmark for Visual Question Answering (VQA) | Jul 3, 2024 | Visual Question AnsweringVisual Question Answering (VQA) | CodeCode Available | 0 |