| Mitigating Object Hallucinations via Sentence-Level Early Intervention | Jul 16, 2025 | HallucinationMM-Vet | CodeCode Available | 1 | 5 |
| Multi-modal Preference Alignment Remedies Degradation of Visual Instruction Tuning on Language Models | Feb 16, 2024 | DiversityInstruction Following | CodeCode Available | 1 | 5 |
| Text as Images: Can Multimodal Large Language Models Follow Printed Instructions in Pixels? | Nov 29, 2023 | In-Context LearningInstruction Following | CodeCode Available | 1 | 5 |
| OmniFusion Technical Report | Apr 9, 2024 | MM-VetTextVQA | CodeCode Available | 0 | 5 |
| Enhancing the Spatial Awareness Capability of Multi-Modal Large Language Model | Oct 31, 2023 | Autonomous DrivingLanguage Modeling | —Unverified | 0 | 0 |
| EfficientLLaVA: Generalizable Auto-Pruning for Large Vision-language Models | Jan 1, 2025 | MM-VetMultimodal Reasoning | —Unverified | 0 | 0 |
| EfficientLLaVA:Generalizable Auto-Pruning for Large Vision-language Models | Mar 19, 2025 | MM-VetMultimodal Reasoning | —Unverified | 0 | 0 |
| MR. Judge: Multimodal Reasoner as a Judge | May 19, 2025 | MM-VetMultiple-choice | —Unverified | 0 | 0 |
| DIEM: Decomposition-Integration Enhancing Multimodal Insights | Jan 1, 2024 | MM-VetQuestion Answering | —Unverified | 0 | 0 |