| CL-MoE: Enhancing Multimodal Large Language Model with Dual Momentum Mixture-of-Experts for Continual Visual Question Answering | Mar 1, 2025 | Continual LearningLanguage Modeling | —Unverified | 0 |
| Optimus-2: Multimodal Minecraft Agent with Goal-Observation-Action Conditioned Policy | Feb 27, 2025 | Large Language ModelMinecraft | —Unverified | 0 |
| OmniParser V2: Structured-Points-of-Thought for Unified Visual Text Parsing and Its Generality to Multimodal Large Language Models | Feb 22, 2025 | document understandingKey Information Extraction | —Unverified | 0 |
| Gesture-Aware Zero-Shot Speech Recognition for Patients with Language Disorders | Feb 18, 2025 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| MMRC: A Large-Scale Benchmark for Understanding Multimodal Large Language Model in Real-World Conversation | Feb 17, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Leveraging Multimodal-LLMs Assisted by Instance Segmentation for Intelligent Traffic Monitoring | Feb 16, 2025 | Instance SegmentationLanguage Modeling | —Unverified | 0 |
| Distraction is All You Need for Multimodal Large Language Model Jailbreaking | Feb 15, 2025 | AllLanguage Modeling | —Unverified | 0 |
| On Fairness of Unified Multimodal Large Language Model for Image Generation | Feb 5, 2025 | FairnessImage Generation | —Unverified | 0 |
| MPIC: Position-Independent Multimodal Context Caching System for Efficient MLLM Serving | Feb 4, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Leveraging Multimodal LLM for Inspirational User Interface Search | Jan 29, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| Learning Free Token Reduction for Multi-Modal Large Language Models | Jan 29, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| HumanOmni: A Large Vision-Speech Language Model for Human-Centric Video Understanding | Jan 25, 2025 | Action UnderstandingEmotion Recognition | —Unverified | 0 |
| EventVL: Understand Event Streams via Multimodal Large Language Model | Jan 23, 2025 | Event-based visionLanguage Modeling | —Unverified | 0 |
| Interpretable Droplet Digital PCR Assay for Trustworthy Molecular Diagnostics | Jan 16, 2025 | Large Language ModelMultimodal Large Language Model | —Unverified | 0 |
| Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks | Jan 14, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| MinMo: A Multimodal Large Language Model for Seamless Voice Interaction | Jan 10, 2025 | Instruction FollowingLanguage Modeling | —Unverified | 0 |
| LLaVA-Octopus: Unlocking Instruction-Driven Adaptive Projector Fusion for Video Understanding | Jan 9, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Interpretable Face Anti-Spoofing: Enhancing Generalization with Multimodal Large Language Models | Jan 3, 2025 | Binary ClassificationFace Anti-Spoofing | —Unverified | 0 |
| Beyond Text: Implementing Multimodal Large Language Model-Powered Multi-Agent Systems Using a No-Code Platform | Jan 1, 2025 | Code GenerationImage Generation | —Unverified | 0 |
| S4-Driver: Scalable Self-Supervised Driving Multimodal Large Language Model with Spatio-Temporal Visual Representation | Jan 1, 2025 | Autonomous DrivingAutonomous Vehicles | —Unverified | 0 |
| GroundingFace: Fine-grained Face Understanding via Pixel Grounding Multimodal Large Language Model | Jan 1, 2025 | AttributeLanguage Modeling | —Unverified | 0 |
| ST^3: Accelerating Multimodal Large Language Model by Spatial-Temporal Visual Token Trimming | Dec 28, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| MLLM-SUL: Multimodal Large Language Model for Semantic Scene Understanding and Localization in Traffic Scenarios | Dec 27, 2024 | Autonomous DrivingLanguage Modeling | CodeCode Available | 0 |
| A Large-scale Interpretable Multi-modality Benchmark for Facial Image Forgery Localization | Dec 27, 2024 | Face SwappingImage Segmentation | —Unverified | 0 |
| SubstationAI: Multimodal Large Model-Based Approaches for Analyzing Substation Equipment Faults | Dec 22, 2024 | Data AugmentationFault Diagnosis | —Unverified | 0 |