| Alleviating Hallucination in Large Vision-Language Models with Active Retrieval Augmentation | Aug 1, 2024 | HallucinationImage Comprehension | —Unverified | 0 | 0 |
| On the Performance of Multimodal Language Models | Oct 4, 2023 | BenchmarkingBinary Classification | —Unverified | 0 | 0 |
| RAD: Retrieval-Augmented Decision-Making of Meta-Actions with Vision-Language Models in Autonomous Driving | Mar 18, 2025 | Autonomous DrivingDecision Making | —Unverified | 0 | 0 |
| Rec-GPT4V: Multimodal Recommendation with Large Vision-Language Models | Feb 13, 2024 | Image ComprehensionMultimodal Recommendation | —Unverified | 0 | 0 |
| RGB-Th-Bench: A Dense benchmark for Visual-Thermal Understanding of Vision Language Models | Mar 25, 2025 | Image ComprehensionVisual Reasoning | —Unverified | 0 | 0 |
| SimpleVQA: Multimodal Factuality Evaluation for Multimodal Large Language Models | Feb 18, 2025 | Image ComprehensionQuestion Answering | —Unverified | 0 | 0 |
| SlideAVSR: A Dataset of Paper Explanation Videos for Audio-Visual Speech Recognition | Jan 18, 2024 | Audio-Visual Speech RecognitionAutomatic Speech Recognition | —Unverified | 0 | 0 |
| Survey of different Large Language Model Architectures: Trends, Benchmarks, and Challenges | Dec 4, 2024 | Code GenerationImage Comprehension | —Unverified | 0 | 0 |
| Teach Multimodal LLMs to Comprehend Electrocardiographic Images | Oct 21, 2024 | DiagnosticImage Comprehension | —Unverified | 0 | 0 |