| Human-centered Interactive Learning via MLLMs for Text-to-Image Person Re-identification | May 21, 2025 | Data AugmentationLarge Language Model | —Unverified | 0 | 0 |
| HumanOmni: A Large Vision-Speech Language Model for Human-Centric Video Understanding | Jan 25, 2025 | Action UnderstandingEmotion Recognition | —Unverified | 0 | 0 |
| Hybrid Agents for Image Restoration | Mar 13, 2025 | Image RestorationIn-Context Learning | —Unverified | 0 | 0 |
| ILLUME: Illuminating Your LLMs to See, Draw, and Self-Enhance | Dec 9, 2024 | Image GenerationLanguage Modeling | —Unverified | 0 | 0 |
| Imaginations of WALL-E : Reconstructing Experiences with an Imagination-Inspired Module for Advanced AI Systems | Aug 20, 2023 | Emotion RecognitionLanguage Modelling | —Unverified | 0 | 0 |
| Interpretable Bilingual Multimodal Large Language Model for Diverse Biomedical Tasks | Oct 24, 2024 | image-classificationImage Classification | —Unverified | 0 | 0 |
| Interpretable Droplet Digital PCR Assay for Trustworthy Molecular Diagnostics | Jan 16, 2025 | Large Language ModelMultimodal Large Language Model | —Unverified | 0 | 0 |
| Interpretable Face Anti-Spoofing: Enhancing Generalization with Multimodal Large Language Models | Jan 3, 2025 | Binary ClassificationFace Anti-Spoofing | —Unverified | 0 | 0 |
| Investigating the Catastrophic Forgetting in Multimodal Large Language Models | Sep 19, 2023 | image-classificationImage Classification | —Unverified | 0 | 0 |
| Is your multimodal large language model a good science tutor? | May 9, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| J-EDI QA: Benchmark for deep-sea organism-specific multimodal LLM | Dec 20, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| KptLLM++: Towards Generic Keypoint Comprehension with Large Language Model | Jul 15, 2025 | Keypoint DetectionLanguage Modeling | —Unverified | 0 | 0 |
| Learning Free Token Reduction for Multi-Modal Large Language Models | Jan 29, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| LEGION: Learning to Ground and Explain for Synthetic Image Detection | Mar 19, 2025 | Artifact DetectionImage Manipulation | —Unverified | 0 | 0 |
| Leveraging Multimodal-LLMs Assisted by Instance Segmentation for Intelligent Traffic Monitoring | Feb 16, 2025 | Instance SegmentationLanguage Modeling | —Unverified | 0 | 0 |
| Lightweight Multimodal Artificial Intelligence Framework for Maritime Multi-Scene Recognition | Mar 10, 2025 | Disaster ResponseLarge Language Model | —Unverified | 0 | 0 |
| LLaDA-V: Large Language Diffusion Models with Visual Instruction Tuning | May 22, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| LLaVA-Docent: Instruction Tuning with Multimodal Large Language Model to Support Art Appreciation Education | Feb 9, 2024 | BenchmarkingChatbot | —Unverified | 0 | 0 |
| LLaVA-MoLE: Sparse Mixture of LoRA Experts for Mitigating Data Conflicts in Instruction Finetuning MLLMs | Jan 29, 2024 | Language ModellingLarge Language Model | —Unverified | 0 | 0 |
| LLaVA-Octopus: Unlocking Instruction-Driven Adaptive Projector Fusion for Video Understanding | Jan 9, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| LLaVA-Read: Enhancing Reading Ability of Multimodal Language Models | Jul 27, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| LLaVA-Ultra: Large Chinese Language and Vision Assistant for Ultrasound | Oct 19, 2024 | Instruction FollowingKnowledge Distillation | —Unverified | 0 | 0 |
| LRMR: LLM-Driven Relational Multi-node Ranking for Lymph Node Metastasis Assessment in Rectal Cancer | Jul 15, 2025 | DiagnosticLarge Language Model | —Unverified | 0 | 0 |
| Lumos : Empowering Multimodal LLMs with Scene Text Recognition | Feb 12, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| Make Imagination Clearer! Stable Diffusion-based Visual Imagination for Multimodal Machine Translation | Dec 17, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| ManipLLM: Embodied Multimodal Large Language Model for Object-Centric Robotic Manipulation | Dec 24, 2023 | Common Sense ReasoningLanguage Modeling | —Unverified | 0 | 0 |
| Marmot: Multi-Agent Reasoning for Multi-Object Self-Correcting in Improving Image-Text Alignment | Apr 10, 2025 | AI AgentAttribute | —Unverified | 0 | 0 |
| Mask-aware Text-to-Image Retrieval: Referring Expression Segmentation Meets Cross-modal Retrieval | Jun 28, 2025 | Cross-Modal RetrievalImage Captioning | —Unverified | 0 | 0 |
| MaVEn: An Effective Multi-granularity Hybrid Visual Encoding Framework for Multimodal Large Language Model | Aug 22, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| Mavors: Multi-granularity Video Representation for Multimodal Large Language Model | Apr 14, 2025 | Computational EfficiencyLanguage Modeling | —Unverified | 0 | 0 |
| Med-2E3: A 2D-Enhanced 3D Medical Multimodal Large Language Model | Nov 19, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| MedXChat: A Unified Multimodal Large Language Model Framework towards CXRs Understanding and Generation | Dec 4, 2023 | Instruction FollowingLanguage Modeling | —Unverified | 0 | 0 |
| MIKO: Multimodal Intention Knowledge Distillation from Large Language Models for Social-Media Commonsense Discovery | Feb 28, 2024 | Knowledge DistillationLanguage Modeling | —Unverified | 0 | 0 |
| MIKU-PAL: An Automated and Standardized Multi-Modal Method for Speech Paralinguistic and Affect Labeling | May 21, 2025 | Emotion RecognitionFace Detection | —Unverified | 0 | 0 |
| MinMo: A Multimodal Large Language Model for Seamless Voice Interaction | Jan 10, 2025 | Instruction FollowingLanguage Modeling | —Unverified | 0 | 0 |
| Mitigating Hallucination in Multimodal Large Language Model via Hallucination-targeted Direct Preference Optimization | Nov 15, 2024 | HallucinationHallucination Evaluation | —Unverified | 0 | 0 |
| MLLM-LLaVA-FL: Multimodal Large Language Model Assisted Federated Learning | Sep 9, 2024 | Federated LearningImage Captioning | —Unverified | 0 | 0 |
| MLLM-For3D: Adapting Multimodal Large Language Model for 3D Reasoning Segmentation | Mar 23, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| MLLM-Guided VLM Fine-Tuning with Joint Inference for Zero-Shot Composed Image Retrieval | May 26, 2025 | Image RetrievalLarge Language Model | —Unverified | 0 | 0 |
| MLLMReID: Multimodal Large Language Model-based Person Re-identification | Jan 24, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| MMMModal -- Multi-Images Multi-Audio Multi-turn Multi-Modal | Feb 17, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| MMRC: A Large-Scale Benchmark for Understanding Multimodal Large Language Model in Real-World Conversation | Feb 17, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| MobileFlow: A Multimodal LLM For Mobile GUI Agent | Jul 5, 2024 | Action AnalysisLanguage Modelling | —Unverified | 0 | 0 |
| MoChat: Joints-Grouped Spatio-Temporal Grounding LLM for Multi-Turn Motion Comprehension and Description | Oct 15, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| MonetGPT: Solving Puzzles Enhances MLLMs' Image Retouching Skills | May 9, 2025 | Image RetouchingLarge Language Model | —Unverified | 0 | 0 |
| MoTrans: Customized Motion Transfer with Text-driven Video Diffusion Models | Dec 2, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| MPIC: Position-Independent Multimodal Context Caching System for Efficient MLLM Serving | Feb 4, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| MRIR: Integrating Multimodal Insights for Diffusion-based Realistic Image Restoration | Jul 4, 2024 | DenoisingImage Restoration | —Unverified | 0 | 0 |
| MR-MLLM: Mutual Reinforcement of Multimodal Comprehension and Vision Perception | Jun 22, 2024 | Common Sense ReasoningLanguage Modelling | —Unverified | 0 | 0 |
| Multimodal Large Language Model Driven Scenario Testing for Autonomous Vehicles | Sep 10, 2024 | Autonomous VehiclesLanguage Modeling | —Unverified | 0 | 0 |