| A Survey on Multimodal Large Language Models | Jun 23, 2023 | HallucinationIn-Context Learning | —Unverified | 0 |
| Audio-Visual LLM for Video Understanding | Dec 11, 2023 | AudioCapsLanguage Modeling | —Unverified | 0 |
| Automated radiotherapy treatment planning guided by GPT-4Vision | Jun 21, 2024 | In-Context LearningLanguage Modelling | —Unverified | 0 |
| Balancing Performance and Efficiency: A Multimodal Large Language Model Pruning Method based Image Text Interaction | Sep 2, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Beyond Retrieval: Joint Supervision and Multimodal Document Ranking for Textbook Question Answering | May 17, 2025 | Document RankingLarge Language Model | —Unverified | 0 |
| Beyond Text: Implementing Multimodal Large Language Model-Powered Multi-Agent Systems Using a No-Code Platform | Jan 1, 2025 | Code GenerationImage Generation | —Unverified | 0 |
| BlueLM-2.5-3B Technical Report | Jul 8, 2025 | Large Language ModelMultimodal Large Language Model | —Unverified | 0 |
| CadVLM: Bridging Language and Vision in the Generation of Parametric CAD Sketches | Sep 26, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| CAFES: A Collaborative Multi-Agent Framework for Multi-Granular Multimodal Essay Scoring | May 20, 2025 | Automated Essay ScoringDiversity | —Unverified | 0 |
| Can Multimodal Large Language Model Think Analogically? | Nov 2, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| CapeLLM: Support-Free Category-Agnostic Pose Estimation with Multimodal Large Language Models | Nov 11, 2024 | 2D Pose EstimationCategory-Agnostic Pose Estimation | —Unverified | 0 |
| CaRDiff: Video Salient Object Ranking Chain of Thought Reasoning for Saliency Prediction with Diffusion | Aug 21, 2024 | Language ModellingLarge Language Model | —Unverified | 0 |
| CFBenchmark-MM: Chinese Financial Assistant Benchmark for Multimodal Large Language Model | Jun 16, 2025 | Decision MakingFinancial Analysis | —Unverified | 0 |
| ChatEXAONEPath: An Expert-level Multimodal Large Language Model for Histopathology Using Whole Slide Images | Apr 17, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| ChatGPT Meets Iris Biometrics | Aug 9, 2024 | Face RecognitionIris Recognition | —Unverified | 0 |
| ChatSpot: Bootstrapping Multimodal LLMs via Precise Referring Instruction Tuning | Jul 18, 2023 | Instruction FollowingLanguage Modeling | —Unverified | 0 |
| ChatTracker: Enhancing Visual Tracking Performance via Chatting with Multimodal Large Language Model | Nov 4, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Chat with AI: The Surprising Turn of Real-time Video Communication from Human to AI | Jul 14, 2025 | Large Language ModelMultimodal Large Language Model | —Unverified | 0 |
| CINEMA: Coherent Multi-Subject Video Generation via MLLM-Based Guidance | Mar 13, 2025 | Large Language ModelMultimodal Large Language Model | —Unverified | 0 |
| CleanMAP: Distilling Multimodal LLMs for Confidence-Driven Crowdsourced HD Map Updates | Apr 14, 2025 | Autonomous NavigationLane Detection | —Unverified | 0 |
| CL-MoE: Enhancing Multimodal Large Language Model with Dual Momentum Mixture-of-Experts for Continual Visual Question Answering | Mar 1, 2025 | Continual LearningLanguage Modeling | —Unverified | 0 |
| CLSP: High-Fidelity Contrastive Language-State Pre-training for Agent State Representation | Sep 24, 2024 | Contrastive LearningLanguage Modeling | —Unverified | 0 |
| CoDi-2: In-Context, Interleaved, and Interactive Any-to-Any Generation | Nov 30, 2023 | Image GenerationIn-Context Learning | —Unverified | 0 |
| CoDi-2: In-Context Interleaved and Interactive Any-to-Any Generation | Jan 1, 2024 | Image GenerationLanguage Modeling | —Unverified | 0 |
| COEF-VQ: Cost-Efficient Video Quality Understanding through a Cascaded Multimodal LLM Framework | Dec 11, 2024 | GPULanguage Modeling | —Unverified | 0 |