| mmE5: Improving Multimodal Multilingual Embeddings via High-quality Synthetic Data | Feb 12, 2025 | cross-modal alignmentLarge Language Model | CodeCode Available | 2 |
| On Fairness of Unified Multimodal Large Language Model for Image Generation | Feb 5, 2025 | FairnessImage Generation | —Unverified | 0 |
| MPIC: Position-Independent Multimodal Context Caching System for Efficient MLLM Serving | Feb 4, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Leveraging Multimodal LLM for Inspirational User Interface Search | Jan 29, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| Learning Free Token Reduction for Multi-Modal Large Language Models | Jan 29, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| HumanOmni: A Large Vision-Speech Language Model for Human-Centric Video Understanding | Jan 25, 2025 | Action UnderstandingEmotion Recognition | —Unverified | 0 |
| PatentLMM: Large Multimodal Model for Generating Descriptions for Patent Figures | Jan 25, 2025 | Large Language ModelMultimodal Large Language Model | CodeCode Available | 1 |
| EventVL: Understand Event Streams via Multimodal Large Language Model | Jan 23, 2025 | Event-based visionLanguage Modeling | —Unverified | 0 |
| VARGPT: Unified Understanding and Generation in a Visual Autoregressive Multimodal Large Language Model | Jan 21, 2025 | Image GenerationInstruction Following | CodeCode Available | 3 |
| EndoChat: Grounded Multimodal Large Language Model for Endoscopic Surgery | Jan 20, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 1 |