| Towards Visual Question Answering on Pathology Images | Aug 1, 2021 | Decision MakingQuestion Answering | CodeCode Available | 0 |
| Active Learning for Visual Question Answering: An Empirical Study | Nov 6, 2017 | Active LearningVisual Question Answering | CodeCode Available | 0 |
| Improved RAMEN: Towards Domain Generalization for Visual Question Answering | Sep 6, 2021 | Domain GeneralizationQuestion Answering | CodeCode Available | 0 |
| Improved Fusion of Visual and Language Representations by Dense Symmetric Co-Attention for Visual Question Answering | Apr 3, 2018 | Visual Question AnsweringVisual Question Answering (VQA) | CodeCode Available | 0 |
| RUBi: Reducing Unimodal Biases for Visual Question Answering | Dec 1, 2019 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| RUBi: Reducing Unimodal Biases in Visual Question Answering | Jun 24, 2019 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| Image Content Generation with Causal Reasoning | Dec 12, 2023 | Image GenerationQuestion Answering | CodeCode Available | 0 |
| Safeguarding Data in Multimodal AI: A Differentially Private Approach to CLIP Training | Jun 13, 2023 | image-classificationImage Classification | CodeCode Available | 0 |
| Zero-shot Translation of Attention Patterns in VQA Models to Natural Language | Nov 8, 2023 | Image CaptioningLanguage Modeling | CodeCode Available | 0 |
| Track the Answer: Extending TextVQA from Image to Video with Spatio-Temporal Clues | Dec 17, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| Image Captioning for Effective Use of Language Models in Knowledge-Based Visual Question Answering | Sep 15, 2021 | Image CaptioningKnowledge Graphs | CodeCode Available | 0 |
| Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks | Aug 22, 2022 | AllCross-Modal Retrieval | CodeCode Available | 0 |
| ArtQuest: Countering Hidden Language Biases in ArtVQA | Jan 4, 2024 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| IMAD: IMage-Augmented multi-modal Dialogue | May 17, 2023 | Dialogue GenerationQuestion Answering | CodeCode Available | 0 |
| Beyond Bilinear: Generalized Multimodal Factorized High-order Pooling for Visual Question Answering | Aug 10, 2017 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| Illusory VQA: Benchmarking and Enhancing Multimodal Models on Visual Illusions | Dec 11, 2024 | BenchmarkingQuestion Answering | CodeCode Available | 0 |
| Transfer Learning via Unsupervised Task Discovery for Visual Question Answering | Oct 3, 2018 | Question AnsweringTransfer Learning | CodeCode Available | 0 |
| Transformer Module Networks for Systematic Generalization in Visual Question Answering | Jan 27, 2022 | Question AnsweringSystematic Generalization | CodeCode Available | 0 |
| Beyond Accuracy: A Consolidated Tool for Visual Question Answering Benchmarking | Oct 11, 2021 | BenchmarkingQuestion Answering | CodeCode Available | 0 |
| BERTHop: An Effective Vision-and-Language Model for Chest X-ray Disease Diagnosis | Aug 10, 2021 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| Scene Graph Prediction with Limited Labels | Apr 25, 2019 | Knowledge Base CompletionPrediction | CodeCode Available | 0 |
| LLaVA Steering: Visual Instruction Tuning with 500x Fewer Parameters through Modality Linear Representation-Steering | Dec 16, 2024 | In-Context LearningInstruction Following | CodeCode Available | 0 |
| Benchmarking Vision-Language Contrastive Methods for Medical Representation Learning | Jun 11, 2024 | BenchmarkingContrastive Learning | CodeCode Available | 0 |
| Transparency by Design: Closing the Gap Between Performance and Interpretability in Visual Reasoning | Mar 14, 2018 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| ILLUME: Rationalizing Vision-Language Models through Human Interactions | Aug 17, 2022 | Image CaptioningQuestion Answering | CodeCode Available | 0 |
| Detecting Knowledge Boundary of Vision Large Language Models by Sampling-Based Inference | Feb 25, 2025 | Question AnsweringRAG | CodeCode Available | 0 |
| IIU: Independent Inference Units for Knowledge-based Visual Question Answering | Aug 15, 2024 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| Traveling Across Languages: Benchmarking Cross-Lingual Consistency in Multimodal LLMs | May 21, 2025 | BenchmarkingQuestion Answering | CodeCode Available | 0 |
| Visually Dehallucinative Instruction Generation | Feb 13, 2024 | HallucinationLanguage Modeling | CodeCode Available | 0 |
| II-MMR: Identifying and Improving Multi-modal Multi-hop Reasoning in Visual Question Answering | Feb 16, 2024 | Question AnsweringTriplet | CodeCode Available | 0 |
| Treble Counterfactual VLMs: A Causal Approach to Hallucination | Mar 8, 2025 | Autonomous Drivingcounterfactual | CodeCode Available | 0 |
| Visually Grounded VQA by Lattice-based Retrieval | Nov 15, 2022 | Information RetrievalQuestion Answering | CodeCode Available | 0 |
| Securing Vision-Language Models with a Robust Encoder Against Jailbreak and Adversarial Attacks | Sep 11, 2024 | Image CaptioningQuestion Answering | CodeCode Available | 0 |
| Visually Interpretable Subtask Reasoning for Visual Question Answering | May 12, 2025 | AttributeObject Recognition | CodeCode Available | 0 |
| Barlow constrained optimization for Visual Question Answering | Mar 7, 2022 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| BabelBench: An Omni Benchmark for Code-Driven Analysis of Multimodal and Multistructured Data | Oct 1, 2024 | Code GenerationLogical Reasoning | CodeCode Available | 0 |
| Design as Desired: Utilizing Visual Question Answering for Multimodal Pre-training | Mar 30, 2024 | Contrastive LearningQuestion Answering | CodeCode Available | 0 |
| HumaniBench: A Human-Centric Framework for Large Multimodal Models Evaluation | May 16, 2025 | BenchmarkingEthics | CodeCode Available | 0 |
| HRIBench: Benchmarking Vision-Language Models for Real-Time Human Perception in Human-Robot Interaction | Jun 25, 2025 | BenchmarkingPerson Identification | CodeCode Available | 0 |
| AVQACL: A Novel Benchmark for Audio-Visual Question Answering Continual Learning | Jan 1, 2025 | Audio-visual Question AnsweringContinual Learning | CodeCode Available | 0 |
| TUBench: Benchmarking Large Vision-Language Models on Trustworthiness with Unanswerable Questions | Oct 5, 2024 | BenchmarkingHallucination | CodeCode Available | 0 |
| Delving Deeper into Cross-lingual Visual Question Answering | Feb 15, 2022 | Inductive BiasQuestion Answering | CodeCode Available | 0 |
| Why do These Match? Explaining the Behavior of Image Similarity Models | May 26, 2019 | AttributeGeneral Classification | CodeCode Available | 0 |
| Towards Flexible Evaluation for Generative Visual Question Answering | Aug 1, 2024 | DecoderGenerative Visual Question Answering | CodeCode Available | 0 |
| Analyzing the Behavior of Visual Question Answering Models | Jun 23, 2016 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| Select, Substitute, Search: A New Benchmark for Knowledge-Augmented Visual Question Answering | Mar 9, 2021 | Optical Character Recognition (OCR)Question Answering | CodeCode Available | 0 |
| Self-Critical Reasoning for Robust Visual Question Answering | May 24, 2019 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| Visual Question Answering: A Survey of Methods and Datasets | Jul 20, 2016 | General KnowledgeSurvey | CodeCode Available | 0 |
| WinoGAViL: Gamified Association Benchmark to Challenge Vision-and-Language Models | Jul 25, 2022 | Common Sense ReasoningGeneral Knowledge | CodeCode Available | 0 |
| How to Determine the Preferred Image Distribution of a Black-Box Vision-Language Model? | Sep 3, 2024 | In-Context LearningLanguage Modeling | CodeCode Available | 0 |