| Synthesize Step-by-Step: Tools Templates and LLMs as Data Generators for Reasoning-Based Chart VQA | Jan 1, 2024 | Chart Question AnsweringData Augmentation | —Unverified | 0 | 0 |
| VLMAE: Vision-Language Masked Autoencoder | Aug 19, 2022 | Image-text RetrievalLanguage Modeling | —Unverified | 0 | 0 |
| VL-Mamba: Exploring State Space Models for Multimodal Learning | Mar 20, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| T2I-FactualBench: Benchmarking the Factuality of Text-to-Image Models with Knowledge-Intensive Concepts | Dec 5, 2024 | BenchmarkingImage Generation | —Unverified | 0 | 0 |
| VLM-Assisted Continual learning for Visual Question Answering in Self-Driving | Feb 2, 2025 | Autonomous DrivingContinual Learning | —Unverified | 0 | 0 |
| Benchmarking Vision Language Models for Cultural Understanding | Jul 15, 2024 | BenchmarkingQuestion Answering | —Unverified | 0 | 0 |
| Benchmarking Large Multimodal Models for Ophthalmic Visual Question Answering with OphthalWeChat | May 26, 2025 | BenchmarkingQuestion Answering | —Unverified | 0 | 0 |
| VLR-Bench: Multilingual Benchmark Dataset for Vision-Language Retrieval Augmented Generation | Dec 13, 2024 | Instruction FollowingQuestion Answering | —Unverified | 0 | 0 |
| EVJVQA Challenge: Multilingual Visual Question Answering | Feb 23, 2023 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| Advancing Medical Imaging with Language Models: A Journey from N-grams to ChatGPT | Apr 11, 2023 | DiagnosticImage Captioning | —Unverified | 0 | 0 |
| Tackling VQA with Pretrained Foundation Models without Further Training | Sep 27, 2023 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| @Bench: Benchmarking Vision-Language Models for Human-centered Assistive Technology | Sep 21, 2024 | BenchmarkingDepth Estimation | —Unverified | 0 | 0 |
| Take A Step Back: Rethinking the Two Stages in Visual Reasoning | Jul 29, 2024 | Logical ReasoningQuestion Answering | —Unverified | 0 | 0 |
| Taking a HINT: Leveraging Explanations to Make Vision and Language Models More Grounded | Feb 11, 2019 | Image CaptioningQuestion Answering | —Unverified | 0 | 0 |
| Talking to the brain: Using Large Language Models as Proxies to Model Brain Semantic Representation | Feb 26, 2025 | Question Answeringvalid | —Unverified | 0 | 0 |
| A dataset of clinically generated visual questions and answers about radiology images | Nov 20, 2018 | Decision MakingMedical Visual Question Answering | —Unverified | 0 | 0 |
| VolDoGer: LLM-assisted Datasets for Domain Generalization in Vision-Language Tasks | Jul 29, 2024 | Deep LearningDomain Generalization | —Unverified | 0 | 0 |
| Task-driven Visual Saliency and Attention-based Visual Question Answering | Feb 22, 2017 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| Task Formulation Matters When Learning Continuously: A Case Study in Visual Question Answering | Jan 16, 2022 | Continual LearningIncremental Learning | —Unverified | 0 | 0 |
| Adaptive Token Boundaries: Integrating Human Chunking Mechanisms into Multimodal LLMs | May 3, 2025 | ChunkingQuestion Answering | —Unverified | 0 | 0 |
| Task-Oriented Feature Compression for Multimodal Understanding via Device-Edge Co-Inference | Mar 17, 2025 | Feature CompressionImage Compression | —Unverified | 0 | 0 |
| Being Negative but Constructively: Lessons Learnt from Creating Better Visual Question Answering Datasets | Apr 24, 2017 | Multiple-choiceQuestion Answering | —Unverified | 0 | 0 |
| Task-Oriented Multi-User Semantic Communications | Dec 19, 2021 | Image RetrievalMachine Translation | —Unverified | 0 | 0 |
| Task-Oriented Semantic Communication in Large Multimodal Models-based Vehicle Networks | May 5, 2025 | Question AnsweringSemantic Communication | —Unverified | 0 | 0 |
| Task Progressive Curriculum Learning for Robust Visual Question Answering | Nov 26, 2024 | Data AugmentationEnsemble Learning | —Unverified | 0 | 0 |