| Language bias in Visual Question Answering: A Survey and Taxonomy | Nov 16, 2021 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| Language Features Matter: Effective Language Representations for Vision-Language Tasks | Aug 17, 2019 | Image CaptioningLanguage Modelling | —Unverified | 0 | 0 |
| From Pixels to Prose: Advancing Multi-Modal Language Models for Remote Sensing | Nov 5, 2024 | Change DetectionContrastive Learning | —Unverified | 0 | 0 |
| Language-Image Models with 3D Understanding | May 6, 2024 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| From Pixels to Objects: Cubic Visual Attention for Visual Question Answering | Jun 4, 2022 | ObjectQuestion Answering | —Unverified | 0 | 0 |
| Language Is Not All You Need: Aligning Perception with Language Models | Feb 27, 2023 | AllImage Captioning | —Unverified | 0 | 0 |
| From Known to the Unknown: Transferring Knowledge to Answer Questions about Novel Visual and Semantic Concepts | Nov 30, 2018 | Novel ConceptsQuestion Answering | —Unverified | 0 | 0 |
| From Image to Language: A Critical Analysis of Visual Question Answering (VQA) Approaches, Challenges, and Opportunities | Nov 1, 2023 | NavigateQuestion Answering | —Unverified | 0 | 0 |
| From Images to Textual Prompts: Zero-Shot Visual Question Answering With Frozen Large Language Models | Jan 1, 2023 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| From Head to Tail: Towards Balanced Representation in Large Vision-Language Models through Adaptive Data Calibration | Mar 17, 2025 | DenoisingQuestion Answering | —Unverified | 0 | 0 |
| Zero-Shot Visual Reasoning by Vision-Language Models: Benchmarking and Analysis | Aug 27, 2024 | BenchmarkingLarge Language Model | —Unverified | 0 | 0 |
| UPME: An Unsupervised Peer Review Framework for Multimodal Large Language Model Evaluation | Mar 19, 2025 | Language Model EvaluationLanguage Modeling | —Unverified | 0 | 0 |
| From Easy to Hard: Learning Language-guided Curriculum for Visual Question Answering on Remote Sensing Data | May 6, 2022 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| freePruner: A Training-free Approach for Large Multimodal Model Acceleration | Nov 23, 2024 | QuantizationQuestion Answering | —Unverified | 0 | 0 |
| Free Form Medical Visual Question Answering in Radiology | Jan 23, 2024 | DiagnosticForm | —Unverified | 0 | 0 |
| Foundational Model for Electron Micrograph Analysis: Instruction-Tuning Small-Scale Language-and-Vision Assistant for Enterprise Adoption | Aug 23, 2024 | Instruction FollowingKnowledge Distillation | —Unverified | 0 | 0 |
| Large Scale Scene Text Verification with Guided Attention | Apr 23, 2018 | Question AnsweringScene Text Detection | —Unverified | 0 | 0 |
| Large Vision-Language Models for Remote Sensing Visual Question Answering | Nov 16, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| Using Visual Cropping to Enhance Fine-Detail Question Answering of BLIP-Family Models | May 31, 2023 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| Latent Variable Models for Visual Question Answering | Jan 16, 2021 | BenchmarkingQuestion Answering | —Unverified | 0 | 0 |
| Fooling Vision and Language Models Despite Localization and Attention Mechanism | Sep 25, 2017 | Dense CaptioningNatural Language Understanding | —Unverified | 0 | 0 |
| LaVida Drive: Vision-Text Interaction VLM for Autonomous Driving with Token Selection, Recovery and Enhancement | Nov 20, 2024 | Autonomous DrivingComputational Efficiency | —Unverified | 0 | 0 |
| LAVIS: A Library for Language-Vision Intelligence | Sep 15, 2022 | BenchmarkingImage Captioning | —Unverified | 0 | 0 |
| VALSE: A Task-Independent Benchmark for Vision and Language Models centered on Linguistic Phenomena | Aug 17, 2021 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| Answer-Me: Multi-Task Open-Vocabulary Visual Question Answering | May 2, 2022 | DecoderImage Captioning | —Unverified | 0 | 0 |