| AntiGrounding: Lifting Robotic Actions into VLM Representation Space for Decision Making | Jun 14, 2025 | Decision MakingQuestion Answering | —Unverified | 0 | 0 |
| Learning Sparsity for Effective and Efficient Music Performance Question Answering | Jun 2, 2025 | Audio-visual Question AnsweringQuestion Answering | —Unverified | 0 | 0 |
| Dual Capsule Attention Mask Network with Mutual Learning for Visual Question Answering | Oct 1, 2022 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| Learning Sparse Mixture of Experts for Visual Question Answering | Sep 19, 2019 | Mixture-of-ExpertsQuestion Answering | —Unverified | 0 | 0 |
| Learning Rich Image Region Representation for Visual Question Answering | Oct 29, 2019 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| Bridge Damage Cause Estimation Using Multiple Images Based on Visual Question Answering | Feb 18, 2023 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| Learning Reasoning Paths over Semantic Graphs for Video-grounded Dialogues | Mar 1, 2021 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| Learning Models for Actions and Person-Object Interactions with Transfer to Question Answering | Apr 16, 2016 | General ClassificationHuman-Object Interaction Detection | —Unverified | 0 | 0 |
| Learning How To Ask: Cycle-Consistency Refines Prompts in Multimodal Foundation Models | Feb 13, 2024 | Code GenerationHumanEval | —Unverified | 0 | 0 |
| Breaking Neural Network Scaling Laws with Modularity | Sep 9, 2024 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| DreamSync: Aligning Text-to-Image Generation with Image Understanding Feedback | Nov 29, 2023 | Image GenerationQuestion Answering | —Unverified | 0 | 0 |
| Breaking Down Questions for Outside-Knowledge Visual Question Answering | Nov 16, 2021 | Graph Neural NetworkQuestion Answering | —Unverified | 0 | 0 |
| Answer-Type Prediction for Visual Question Answering | Jun 1, 2016 | Object RecognitionPrediction | —Unverified | 0 | 0 |
| Adversarial Representation Learning for Text-to-Image Matching | Aug 28, 2019 | Image CaptioningLanguage Modeling | —Unverified | 0 | 0 |
| Learning Compositional Representation for Few-shot Visual Question Answering | Feb 21, 2021 | AttributeQuestion Answering | —Unverified | 0 | 0 |
| Learning by Hallucinating: Vision-Language Pre-training with Weak Supervision | Oct 24, 2022 | cross-modal alignmentCross-Modal Retrieval | —Unverified | 0 | 0 |
| Learning by Asking Questions | Dec 4, 2017 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| Learning Answer Embeddings for Visual Question Answering | Jun 10, 2018 | Question AnsweringTransfer Learning | —Unverified | 0 | 0 |
| LCV2: An Efficient Pretraining-Free Framework for Grounded Visual Question Answering | Jan 29, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| Double Visual Defense: Adversarial Pre-training and Instruction Tuning for Improving Vision-Language Model Robustness | Jan 16, 2025 | Adversarial DefenseAdversarial Robustness | —Unverified | 0 | 0 |
| Breaking Down Questions for Outside-Knowledge VQA | Sep 29, 2021 | Graph Neural NetworkQuestion Answering | —Unverified | 0 | 0 |
| LAVIS: A Library for Language-Vision Intelligence | Sep 15, 2022 | BenchmarkingImage Captioning | —Unverified | 0 | 0 |
| LaVida Drive: Vision-Text Interaction VLM for Autonomous Driving with Token Selection, Recovery and Enhancement | Nov 20, 2024 | Autonomous DrivingComputational Efficiency | —Unverified | 0 | 0 |
| Domain-robust VQA with diverse datasets and methods but no target labels | Mar 29, 2021 | Domain AdaptationObject Recognition | —Unverified | 0 | 0 |
| Latent Variable Models for Visual Question Answering | Jan 16, 2021 | BenchmarkingQuestion Answering | —Unverified | 0 | 0 |