| ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation | Jan 11, 2025 | Chart UnderstandingCode Generation | CodeCode Available | 2 | 5 |
| LION: Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge | Jan 1, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| PDF-WuKong: A Large Multimodal Model for Efficient Long PDF Reading with End-to-End Sparse Sampling | Oct 8, 2024 | document understandingLanguage Modeling | CodeCode Available | 2 | 5 |
| Towards a Multimodal Large Language Model with Pixel-Level Insight for Biomedicine | Dec 12, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| Protecting Privacy in Multimodal Large Language Models with MLLMU-Bench | Oct 29, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos | Sep 29, 2024 | AllImage Segmentation | CodeCode Available | 2 | 5 |
| LLMGA: Multimodal Large Language Model based Generation Assistant | Nov 27, 2023 | Image GenerationLanguage Modeling | CodeCode Available | 2 | 5 |
| Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want | Mar 29, 2024 | Instruction FollowingLanguage Modelling | CodeCode Available | 2 | 5 |
| Paint by Inpaint: Learning to Add Image Objects by Removing Them First | Apr 28, 2024 | Image InpaintingLanguage Modeling | CodeCode Available | 2 | 5 |
| Holmes-VAD: Towards Unbiased and Explainable Video Anomaly Detection via Multi-modal LLM | Jun 18, 2024 | Anomaly DetectionAnomaly Localization | CodeCode Available | 2 | 5 |
| mmE5: Improving Multimodal Multilingual Embeddings via High-quality Synthetic Data | Feb 12, 2025 | cross-modal alignmentLarge Language Model | CodeCode Available | 2 | 5 |
| GeReA: Question-Aware Prompt Captions for Knowledge-based Visual Question Answering | Feb 4, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| A Survey of Multimodal Large Language Model from A Data-centric Perspective | May 26, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| Dimple: Discrete Diffusion Multimodal Large Language Model with Parallel Decoding | May 22, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| Explore the Limits of Omni-modal Pretraining at Scale | Jun 13, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| Next Token Is Enough: Realistic Image Quality and Aesthetic Scoring with Multimodal Large Language Model | Mar 8, 2025 | Image Quality AssessmentLanguage Modeling | CodeCode Available | 2 | 5 |
| Parameter-Inverted Image Pyramid Networks for Visual Perception and Multimodal Understanding | Jan 14, 2025 | image-classificationImage Classification | CodeCode Available | 2 | 5 |
| CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenarios | Mar 7, 2024 | Audio-visual Question AnsweringAudio-Visual Question Answering (AVQA) | CodeCode Available | 2 | 5 |
| T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation | Jul 19, 2024 | AttributeLanguage Modeling | CodeCode Available | 2 | 5 |
| Meaning Typed Prompting: A Technique for Efficient, Reliable Structured Output Generation | Oct 22, 2024 | Large Language ModelMultimodal Large Language Model | CodeCode Available | 1 | 5 |
| MedTVT-R1: A Multimodal LLM Empowering Medical Reasoning and Diagnosis | Jun 23, 2025 | DiagnosticLarge Language Model | CodeCode Available | 1 | 5 |
| LMEye: An Interactive Perception Network for Large Language Models | May 5, 2023 | Language ModellingLarge Language Model | CodeCode Available | 1 | 5 |
| Multi-modal Instruction Tuned LLMs with Fine-grained Visual Perception | Mar 5, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 | 5 |
| DaLPSR: Leverage Degradation-Aligned Language Prompt for Real-World Image Super-Resolution | Jun 24, 2024 | Image RestorationImage Super-Resolution | CodeCode Available | 1 | 5 |
| MultiMath: Bridging Visual and Mathematical Reasoning for Large Language Models | Aug 30, 2024 | Image CaptioningLanguage Modeling | CodeCode Available | 1 | 5 |