| Logical Implications for Visual Question Answering Consistency | Mar 16, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 0 | 5 |
| Locally Smoothed Neural Networks | Nov 22, 2017 | Face VerificationQuestion Answering | CodeCode Available | 0 | 5 |
| Dynamic Memory Networks for Visual and Textual Question Answering | Mar 4, 2016 | Question AnsweringVisual Question Answering | CodeCode Available | 0 | 5 |
| Dynamic Key-value Memory Enhanced Multi-step Graph Reasoning for Knowledge-based Visual Question Answering | Mar 6, 2022 | Graph AttentionQuestion Answering | CodeCode Available | 0 | 5 |
| LLaVA-OneVision: Easy Visual Task Transfer | Aug 6, 2024 | 3D Question Answering (3D-QA) | CodeCode Available | 0 | 5 |
| Learning Visual Question Answering by Bootstrapping Hard Attention | Aug 1, 2018 | Hard AttentionQuestion Answering | CodeCode Available | 0 | 5 |
| LLM-Assisted Multi-Teacher Continual Learning for Visual Question Answering in Robotic Surgery | Feb 26, 2024 | Continual LearningExemplar-Free | CodeCode Available | 0 | 5 |
| Loss re-scaling VQA: Revisiting the LanguagePrior Problem from a Class-imbalance View | Oct 30, 2020 | Face Recognitionimage-classification | CodeCode Available | 0 | 5 |
| Overcoming Language Priors in Visual Question Answering via Distinguishing Superficially Similar Instances | Sep 18, 2022 | AttributeQuestion Answering | CodeCode Available | 0 | 5 |
| Siamese Tracking with Lingual Object Constraints | Nov 23, 2020 | ObjectObject Tracking | CodeCode Available | 0 | 5 |
| Learning Visual Knowledge Memory Networks for Visual Question Answering | Jun 13, 2018 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| Dynamic Fusion With Intra- and Inter-Modality Attention Flow for Visual Question Answering | Jun 1, 2019 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| Learning to Specialize with Knowledge Distillation for Visual Question Answering | Dec 1, 2018 | General ClassificationGeneral Knowledge | —Unverified | 0 | 0 |
| Learning to Select Question-Relevant Relations for Visual Question Answering | Jun 1, 2021 | Graph AttentionQuestion Answering | —Unverified | 0 | 0 |
| Dynamic Fusion with Intra- and Inter- Modality Attention Flow for Visual Question Answering | Dec 13, 2018 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| Building Trustworthy Multimodal AI: A Review of Fairness, Transparency, and Ethics in Vision-Language Tasks | Apr 14, 2025 | EthicsFairness | —Unverified | 0 | 0 |
| Learning to Recognize the Unseen Visual Predicates | Sep 25, 2019 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| Neural Reasoning, Fast and Slow, for Video Question Answering | Jul 10, 2019 | Natural QuestionsQuestion Answering | —Unverified | 0 | 0 |
| DUBLIN -- Document Understanding By Language-Image Network | May 23, 2023 | Document Classificationdocument understanding | —Unverified | 0 | 0 |
| BuDDIE: A Business Document Dataset for Multi-task Information Extraction | Apr 5, 2024 | Document Classificationdocument understanding | —Unverified | 0 | 0 |
| Learning to Disambiguate by Asking Discriminative Questions | Aug 9, 2017 | BenchmarkingImage Captioning | —Unverified | 0 | 0 |
| Learning to Compress Contexts for Efficient Knowledge-based Visual Question Answering | Sep 11, 2024 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| Learning to Compose Diversified Prompts for Image Emotion Classification | Jan 26, 2022 | ClassificationEmotion Classification | —Unverified | 0 | 0 |
| DualNet: Domain-Invariant Network for Visual Question Answering | Jun 20, 2016 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| Bridging the Semantic Gaps: Improving Medical VQA Consistency with LLM-Augmented Question Sets | Apr 16, 2025 | DiversityMedical Visual Question Answering | —Unverified | 0 | 0 |