| InfoChartQA: A Benchmark for Multimodal Question Answering on Infographic Charts | May 25, 2025 | Chart UnderstandingQuestion Answering | CodeCode Available | 3 |
| SATORI-R1: Incentivizing Multimodal Reasoning with Spatial Grounding and Verifiable Rewards | May 25, 2025 | Image CaptioningMultimodal Reasoning | CodeCode Available | 1 |
| Are Vision Language Models Ready for Clinical Diagnosis? A 3D Medical Benchmark for Tumor-centric Visual Question Answering | May 25, 2025 | AnatomyBenchmarking | CodeCode Available | 1 |
| GC-KBVQA: A New Four-Stage Framework for Enhancing Knowledge Based Visual Question Answering Performance | May 25, 2025 | Caption GenerationQuestion Answering | —Unverified | 0 |
| Scaling Up Biomedical Vision-Language Models: Fine-Tuning, Instruction Tuning, and Multi-Modal Learning | May 23, 2025 | DecoderImage Captioning | CodeCode Available | 4 |
| CXReasonBench: A Benchmark for Evaluating Structured Diagnostic Reasoning in Chest X-rays | May 23, 2025 | DiagnosticQuestion Answering | CodeCode Available | 0 |
| VEAttack: Downstream-agnostic Vision Encoder Attack against Large Vision Language Models | May 23, 2025 | Question AnsweringVisual Question Answering | CodeCode Available | 1 |
| CT-Agent: A Multimodal-LLM Agent for 3D CT Radiology Question Answering | May 22, 2025 | Computed Tomography (CT)Question Answering | —Unverified | 0 |
| Benchmarking Retrieval-Augmented Multimomal Generation for Document Question Answering | May 22, 2025 | BenchmarkingEvidence Selection | CodeCode Available | 1 |
| A Causal Approach to Mitigate Modality Preference Bias in Medical Visual Question Answering | May 22, 2025 | counterfactualMedical Visual Question Answering | —Unverified | 0 |