| Describe Anything Model for Visual Question Answering on Text-rich Images | Jul 16, 2025 | DescriptiveLanguage Modeling | CodeCode Available | 1 |
| Barriers in Integrating Medical Visual Question Answering into Radiology Workflows: A Scoping Review and Clinicians' Insights | Jul 9, 2025 | DiagnosticMedical Visual Question Answering | —Unverified | 0 |
| MagiC: Evaluating Multimodal Cognition Toward Grounded Visual Reasoning | Jul 9, 2025 | DiagnosticMultimodal Reasoning | —Unverified | 0 |
| Evaluating Attribute Confusion in Fashion Text-to-Image Generation | Jul 9, 2025 | Attributecross-modal alignment | —Unverified | 0 |
| LinguaMark: Do Multimodal Models Speak Fairly? A Benchmark-Based Evaluation | Jul 9, 2025 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Enhancing Scientific Visual Question Answering through Multimodal Reasoning and Ensemble Modeling | Jul 8, 2025 | ArticlesMultimodal Reasoning | —Unverified | 0 |
| ReLoop: "Seeing Twice and Thinking Backwards" via Closed-loop Training to Mitigate Hallucinations in Multimodal understanding | Jul 7, 2025 | HallucinationQuestion Answering | —Unverified | 0 |
| Revisiting CroPA: A Reproducibility Study and Enhancements for Cross-Prompt Adversarial Transferability in Vision-Language Models | Jun 28, 2025 | image-classificationImage Classification | CodeCode Available | 0 |
| SMMILE: An Expert-Driven Benchmark for Multimodal Medical In-Context Learning | Jun 26, 2025 | In-Context LearningMedical Visual Question Answering | —Unverified | 0 |
| DrishtiKon: Multi-Granular Visual Grounding for Text-Rich Document Images | Jun 26, 2025 | document understandingOptical Character Recognition (OCR) | CodeCode Available | 0 |