| HoloLLM: Multisensory Foundation Model for Language-Grounded Human Sensing and Reasoning | May 23, 2025 | Large Language ModelMultimodal Large Language Model | —Unverified | 0 | 0 |
| How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites | Apr 25, 2024 | 4kLanguage Modeling | —Unverified | 0 | 0 |
| How to Bridge the Gap between Modalities: Survey on Multimodal Large Language Model | Nov 10, 2023 | Image CaptioningLanguage Modeling | —Unverified | 0 | 0 |
| Human-centered Interactive Learning via MLLMs for Text-to-Image Person Re-identification | May 21, 2025 | Data AugmentationLarge Language Model | —Unverified | 0 | 0 |
| HumanOmni: A Large Vision-Speech Language Model for Human-Centric Video Understanding | Jan 25, 2025 | Action UnderstandingEmotion Recognition | —Unverified | 0 | 0 |
| Hybrid Agents for Image Restoration | Mar 13, 2025 | Image RestorationIn-Context Learning | —Unverified | 0 | 0 |
| ILLUME: Illuminating Your LLMs to See, Draw, and Self-Enhance | Dec 9, 2024 | Image GenerationLanguage Modeling | —Unverified | 0 | 0 |
| Imaginations of WALL-E : Reconstructing Experiences with an Imagination-Inspired Module for Advanced AI Systems | Aug 20, 2023 | Emotion RecognitionLanguage Modelling | —Unverified | 0 | 0 |
| InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models | Apr 14, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| Interpretable Bilingual Multimodal Large Language Model for Diverse Biomedical Tasks | Oct 24, 2024 | image-classificationImage Classification | —Unverified | 0 | 0 |