| Rec-GPT4V: Multimodal Recommendation with Large Vision-Language Models | Feb 13, 2024 | Image ComprehensionMultimodal Recommendation | —Unverified | 0 |
| RGB-Th-Bench: A Dense benchmark for Visual-Thermal Understanding of Vision Language Models | Mar 25, 2025 | Image ComprehensionVisual Reasoning | —Unverified | 0 |
| InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition | Sep 26, 2023 | ArticlesImage Comprehension | CodeCode Available | 0 |
| RRHF-V: Ranking Responses to Mitigate Hallucinations in Multimodal Large Language Models with Human Feedback | Jan 1, 2025 | HallucinationImage Comprehension | CodeCode Available | 0 |
| FTII-Bench: A Comprehensive Multimodal Benchmark for Flow Text with Image Insertion | Oct 16, 2024 | ArticlesImage Comprehension | CodeCode Available | 0 |
| CLIC: Contrastive Learning Framework for Unsupervised Image Complexity Representation | Nov 19, 2024 | AttributeContrastive Learning | CodeCode Available | 0 |
| MM-MATH: Advancing Multimodal Math Evaluation with Process Evaluation and Fine-grained Classification | Apr 7, 2024 | Image ComprehensionMath | CodeCode Available | 0 |
| MIRe: Enhancing Multimodal Queries Representation via Fusion-Free Modality Interaction for Multimodal Retrieval | Nov 13, 2024 | Image ComprehensionInformation Retrieval | CodeCode Available | 0 |
| VGA: Vision GUI Assistant -- Minimizing Hallucinations through Image-Centric Fine-Tuning | Jun 20, 2024 | Image ComprehensionQuestion Answering | CodeCode Available | 0 |