| SNAP: A Benchmark for Testing the Effects of Capture Conditions on Fundamental Vision Tasks | May 21, 2025 | image-classificationImage Classification | CodeCode Available | 0 |
| A Dual-Attention Learning Network with Word and Sentence Embedding for Medical Visual Question Answering | Oct 1, 2022 | Medical Visual Question AnsweringQuestion Answering | CodeCode Available | 0 |
| Visual Question Answering using Deep Learning: A Survey and Performance Analysis | Aug 27, 2019 | Common Sense ReasoningQuestion Answering | CodeCode Available | 0 |
| General Greedy De-bias Learning | Dec 20, 2021 | image-classificationImage Classification | CodeCode Available | 0 |
| Soft-Prompting with Graph-of-Thought for Multi-modal Representation Learning | Apr 6, 2024 | Domain GeneralizationImage Retrieval | CodeCode Available | 0 |
| Watch Out Your Album! On the Inadvertent Privacy Memorization in Multi-Modal Large Language Models | Mar 3, 2025 | MemorizationQuestion Answering | CodeCode Available | 0 |
| Answer Them All! Toward Universal Visual Question Answering Models | Mar 1, 2019 | AllQuestion Answering | CodeCode Available | 0 |
| Dataset and Benchmark for Urdu Natural Scenes Text Detection, Recognition and Visual Question Answering | May 21, 2024 | DiversityInformation Retrieval | CodeCode Available | 0 |
| SOrT-ing VQA Models : Contrastive Gradient Learning for Improved Consistency | Oct 20, 2020 | Question AnsweringVisual Grounding | CodeCode Available | 0 |
| OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework | Feb 7, 2022 | Image Captioningimage-classification | CodeCode Available | 0 |
| SparrowVQE: Visual Question Explanation for Course Content Understanding | Nov 12, 2024 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| Game of Sketches: Deep Recurrent Models of Pictionary-style Word Guessing | Jan 29, 2018 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| Sparse and Structured Visual Attention | Feb 13, 2020 | Image CaptioningQuestion Answering | CodeCode Available | 0 |
| Robustness through Data Augmentation Loss Consistency | Oct 21, 2021 | Multi-domain Dialogue State TrackingVisual Question Answering | CodeCode Available | 0 |
| Fully Authentic Visual Question Answering Dataset from Online Communities | Nov 27, 2023 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| D3: Data Diversity Design for Systematic Generalization in Visual Question Answering | Sep 15, 2023 | DiversityQuestion Answering | CodeCode Available | 0 |
| Visual Question Answering: which investigated applications? | Mar 4, 2021 | Image CaptioningQuestion Answering | CodeCode Available | 0 |
| CXReasonBench: A Benchmark for Evaluating Structured Diagnostic Reasoning in Chest X-rays | May 23, 2025 | DiagnosticQuestion Answering | CodeCode Available | 0 |
| cViL: Cross-Lingual Training of Vision-Language Models using Knowledge Distillation | Jun 7, 2022 | Knowledge DistillationQuestion Answering | CodeCode Available | 0 |
| Speech-Based Visual Question Answering | May 1, 2017 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 0 |
| Adapting Visual Question Answering Models for Enhancing Multimodal Community Q&A Platforms | Aug 29, 2018 | Community Question AnsweringGeneral Classification | CodeCode Available | 0 |
| From Images to Textual Prompts: Zero-shot VQA with Frozen Large Language Models | Dec 21, 2022 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| Cross-Modal Contrastive Learning for Robust Reasoning in VQA | Nov 21, 2022 | Contrastive LearningQuestion Answering | CodeCode Available | 0 |
| FRAMES-VQA: Benchmarking Fine-Tuning Robustness across Multi-Modal Shifts in Visual Question Answering | May 27, 2025 | BenchmarkingQuestion Answering | CodeCode Available | 0 |
| Focal Visual-Text Attention for Visual Question Answering | Jun 5, 2018 | Memex Question AnsweringQuestion Answering | CodeCode Available | 0 |
| Cross-Lingual Text-Rich Visual Comprehension: An Information Theory Perspective | Dec 23, 2024 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| UniRS: Unifying Multi-temporal Remote Sensing Tasks through Vision Language Models | Dec 30, 2024 | Question AnsweringScene Classification | CodeCode Available | 0 |