| Ask Your Neurons: A Deep Learning Approach to Visual Question Answering | May 9, 2016 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| Learning Conditioned Graph Structures for Interpretable Visual Question Answering | Jun 19, 2018 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| QAVA: Query-Agnostic Visual Attack to Large Vision-Language Models | Apr 15, 2025 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| Learning by Correction: Efficient Tuning Task for Zero-Shot Generative Vision-Language Reasoning | Apr 1, 2024 | Image CaptioningInstruction Following | CodeCode Available | 0 |
| VL-InterpreT: An Interactive Visualization Tool for Interpreting Vision-Language Transformers | Mar 30, 2022 | Question AnsweringVisual Commonsense Reasoning | CodeCode Available | 0 |
| QLEVR: A Diagnostic Dataset for Quantificational Language and Elementary Visual Reasoning | May 6, 2022 | DiagnosticQuestion Answering | CodeCode Available | 0 |
| QLIP: A Dynamic Quadtree Vision Prior Enhances MLLM Performance Without Retraining | May 29, 2025 | Question AnsweringRepresentation Learning | CodeCode Available | 0 |
| Quantifying and Alleviating the Language Prior Problem in Visual Question Answering | May 13, 2019 | Information RetrievalQuestion Answering | CodeCode Available | 0 |
| Learn from Downstream and Be Yourself in Multimodal Large Language Model Fine-Tuning | Nov 17, 2024 | Image CaptioningLanguage Modeling | CodeCode Available | 0 |
| Value-Spectrum: Quantifying Preferences of Vision-Language Models via Value Decomposition in Social Media Contexts | Nov 18, 2024 | BenchmarkingMultimodal Large Language Model | CodeCode Available | 0 |