| CL-MoE: Enhancing Multimodal Large Language Model with Dual Momentum Mixture-of-Experts for Continual Visual Question Answering | Mar 1, 2025 | Continual LearningLanguage Modeling | —Unverified | 0 |
| Optimus-2: Multimodal Minecraft Agent with Goal-Observation-Action Conditioned Policy | Feb 27, 2025 | Large Language ModelMinecraft | —Unverified | 0 |
| OmniParser V2: Structured-Points-of-Thought for Unified Visual Text Parsing and Its Generality to Multimodal Large Language Models | Feb 22, 2025 | document understandingKey Information Extraction | —Unverified | 0 |
| Gesture-Aware Zero-Shot Speech Recognition for Patients with Language Disorders | Feb 18, 2025 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| MMRC: A Large-Scale Benchmark for Understanding Multimodal Large Language Model in Real-World Conversation | Feb 17, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Leveraging Multimodal-LLMs Assisted by Instance Segmentation for Intelligent Traffic Monitoring | Feb 16, 2025 | Instance SegmentationLanguage Modeling | —Unverified | 0 |
| Distraction is All You Need for Multimodal Large Language Model Jailbreaking | Feb 15, 2025 | AllLanguage Modeling | —Unverified | 0 |
| On Fairness of Unified Multimodal Large Language Model for Image Generation | Feb 5, 2025 | FairnessImage Generation | —Unverified | 0 |
| MPIC: Position-Independent Multimodal Context Caching System for Efficient MLLM Serving | Feb 4, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Leveraging Multimodal LLM for Inspirational User Interface Search | Jan 29, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 0 |