| On the Promises and Challenges of Multimodal Foundation Models for Geographical, Environmental, Agricultural, and Urban Planning Applications | Dec 23, 2023 | geo-localizationimage-classification | —Unverified | 0 |
| On the Significance of Question Encoder Sequence Model in the Out-of-Distribution Performance in Visual Question Answering | Aug 28, 2021 | Graph AttentionQuestion Answering | —Unverified | 0 |
| On the Value of Out-of-Distribution Testing: An Example of Goodhart's Law | May 19, 2020 | Model SelectionQuestion Answering | —Unverified | 0 |
| Open-Ended Visual Question Answering by Multi-Modal Domain Adaptation | Nov 11, 2019 | Domain AdaptationQuestion Answering | —Unverified | 0 |
| Optimizing Explanations by Network Canonization and Hyperparameter Search | Nov 30, 2022 | Explainable Artificial Intelligence (XAI)image-classification | —Unverified | 0 |
| Optimizing Visual Question Answering Models for Driving: Bridging the Gap Between Human and Machine Attention Patterns | Jun 13, 2024 | Autonomous DrivingQuestion Answering | —Unverified | 0 |
| Optimus: Accelerating Large-Scale Multi-Modal LLM Training by Bubble Exploitation | Aug 7, 2024 | GPUQuestion Answering | —Unverified | 0 |
| Order Matters: Exploring Order Sensitivity in Multimodal Large Language Models | Oct 22, 2024 | In-Context LearningQuestion Answering | —Unverified | 0 |
| ORD: Object Relationship Discovery for Visual Dialogue Generation | Jun 15, 2020 | Dialogue GenerationGraph Attention | —Unverified | 0 |
| ORION: A Holistic End-to-End Autonomous Driving Framework by Vision-Language Instructed Action Generation | Mar 25, 2025 | Action GenerationAutonomous Driving | —Unverified | 0 |
| Out of the Box: Reasoning with Graph Convolution Nets for Factual Visual Question Answering | Nov 1, 2018 | Factual Visual Question AnsweringGeneral Knowledge | —Unverified | 0 |
| Overcoming Language Bias in Remote Sensing Visual Question Answering via Adversarial Training | Jun 1, 2023 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Overcoming Language Priors for Visual Question Answering Based on Knowledge Distillation | Jan 10, 2025 | Knowledge DistillationQuestion Answering | —Unverified | 0 |
| Overcoming Language Priors in Visual Question Answering with Adversarial Regularization | Oct 8, 2018 | Question AnsweringVisual Grounding | —Unverified | 0 |
| Overview of TREC 2024 Medical Video Question Answering (MedVidQA) Track | Dec 15, 2024 | Image CaptioningMedical Question Answering | —Unverified | 0 |
| NAAQA: A Neural Architecture for Acoustic Question Answering | Jun 11, 2021 | Acoustic Question AnsweringQuestion Answering | CodeCode Available | 0 |
| Weakly Supervised Relative Spatial Reasoning for Visual Question Answering | Sep 4, 2021 | Question AnsweringSpatial Reasoning | CodeCode Available | 0 |
| CROPE: Evaluating In-Context Adaptation of Vision and Language Models to Culture-Specific Concepts | Oct 20, 2024 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| MUTAN: Multimodal Tucker Fusion for Visual Question Answering | May 18, 2017 | Visual Question AnsweringVisual Question Answering (VQA) | CodeCode Available | 0 |
| Focal Visual-Text Attention for Memex Question Answering | Dec 14, 2018 | Memex Question AnsweringQuestion Answering | CodeCode Available | 0 |
| FM2DS: Few-Shot Multimodal Multihop Data Synthesis with Knowledge Distillation for Question Answering | Dec 9, 2024 | Knowledge DistillationQuestion Answering | CodeCode Available | 0 |
| Answer Questions with Right Image Regions: A Visual Attention Regularization Approach | Feb 3, 2021 | Question AnsweringVisual Grounding | CodeCode Available | 0 |
| NeSyCoCo: A Neuro-Symbolic Concept Composer for Compositional Generalization | Dec 20, 2024 | Compositional Generalization (AVG)Novel Concepts | CodeCode Available | 0 |
| Unlearning Sensitive Information in Multimodal LLMs: Benchmark and Attack-Defense Evaluation | May 1, 2025 | Question AnsweringSpecificity | CodeCode Available | 0 |
| X-GGM: Graph Generative Modeling for Out-of-Distribution Generalization in Visual Question Answering | Jul 24, 2021 | AttributeOut-of-Distribution Generalization | CodeCode Available | 0 |