| Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks | Jan 14, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation | Jan 11, 2025 | Chart UnderstandingCode Generation | CodeCode Available | 2 |
| Valley2: Exploring Multimodal Models with Scalable Vision-Language Design | Jan 10, 2025 | Image CaptioningLanguage Modeling | CodeCode Available | 3 |
| MinMo: A Multimodal Large Language Model for Seamless Voice Interaction | Jan 10, 2025 | Instruction FollowingLanguage Modeling | —Unverified | 0 |
| LLaVA-Octopus: Unlocking Instruction-Driven Adaptive Projector Fusion for Video Understanding | Jan 9, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Interpretable Face Anti-Spoofing: Enhancing Generalization with Multimodal Large Language Models | Jan 3, 2025 | Binary ClassificationFace Anti-Spoofing | —Unverified | 0 |
| GroundingFace: Fine-grained Face Understanding via Pixel Grounding Multimodal Large Language Model | Jan 1, 2025 | AttributeLanguage Modeling | —Unverified | 0 |
| Notes-guided MLLM Reasoning: Enhancing MLLM with Knowledge and Visual Notes for Visual Question Answering | Jan 1, 2025 | Large Language ModelMultimodal Large Language Model | CodeCode Available | 1 |
| S4-Driver: Scalable Self-Supervised Driving Multimodal Large Language Model with Spatio-Temporal Visual Representation | Jan 1, 2025 | Autonomous DrivingAutonomous Vehicles | —Unverified | 0 |
| Beyond Text: Implementing Multimodal Large Language Model-Powered Multi-Agent Systems Using a No-Code Platform | Jan 1, 2025 | Code GenerationImage Generation | —Unverified | 0 |
| ST^3: Accelerating Multimodal Large Language Model by Spatial-Temporal Visual Token Trimming | Dec 28, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| MLLM-SUL: Multimodal Large Language Model for Semantic Scene Understanding and Localization in Traffic Scenarios | Dec 27, 2024 | Autonomous DrivingLanguage Modeling | CodeCode Available | 0 |
| A Large-scale Interpretable Multi-modality Benchmark for Facial Image Forgery Localization | Dec 27, 2024 | Face SwappingImage Segmentation | —Unverified | 0 |
| SubstationAI: Multimodal Large Model-Based Approaches for Analyzing Substation Equipment Faults | Dec 22, 2024 | Data AugmentationFault Diagnosis | —Unverified | 0 |
| MiniGPT-Pancreas: Multimodal Large Language Model for Pancreas Cancer Classification and Detection | Dec 20, 2024 | Cancer ClassificationChatbot | CodeCode Available | 1 |
| J-EDI QA: Benchmark for deep-sea organism-specific multimodal LLM | Dec 20, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Multimodal Hypothetical Summary for Retrieval-based Multi-image Question Answering | Dec 19, 2024 | Contrastive LearningLanguage Modeling | CodeCode Available | 0 |
| Make Imagination Clearer! Stable Diffusion-based Visual Imagination for Multimodal Machine Translation | Dec 17, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| A Survey of Mathematical Reasoning in the Era of Multimodal Large Language Model: Benchmark, Method & Challenges | Dec 16, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| IDEA-Bench: How Far are Generative Models from Professional Designing? | Dec 16, 2024 | Large Language ModelMultimodal Large Language Model | CodeCode Available | 1 |
| MERaLiON-SpeechEncoder: Towards a Speech Foundation Model for Singapore and Beyond | Dec 16, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM | Dec 12, 2024 | Image ComprehensionImage Generation | —Unverified | 0 |
| Towards a Multimodal Large Language Model with Pixel-Level Insight for Biomedicine | Dec 12, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| COEF-VQ: Cost-Efficient Video Quality Understanding through a Cascaded Multimodal LLM Framework | Dec 11, 2024 | GPULanguage Modeling | —Unverified | 0 |
| DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation | Dec 10, 2024 | Image GenerationLanguage Modelling | —Unverified | 0 |