| ShapeLLM: Universal 3D Object Understanding for Embodied Interaction | Feb 27, 2024 | 3D geometry3D Object Captioning | CodeCode Available | 3 |
| TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones | Dec 28, 2023 | Computational EfficiencyImage Captioning | CodeCode Available | 3 |
| Dimple: Discrete Diffusion Multimodal Large Language Model with Parallel Decoding | May 22, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Web-Shepherd: Advancing PRMs for Reinforcing Web Agents | May 21, 2025 | Large Language ModelMultimodal Large Language Model | CodeCode Available | 2 |
| The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer | Apr 14, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Referring to Any Person | Mar 11, 2025 | Large Language ModelMultimodal Large Language Model | CodeCode Available | 2 |
| Next Token Is Enough: Realistic Image Quality and Aesthetic Scoring with Multimodal Large Language Model | Mar 8, 2025 | Image Quality AssessmentLanguage Modeling | CodeCode Available | 2 |
| Keeping Yourself is Important in Downstream Tuning Multimodal Large Language Model | Mar 6, 2025 | General KnowledgeImage Captioning | CodeCode Available | 2 |
| Introducing Visual Perception Token into Multimodal Large Language Model | Feb 24, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| mmE5: Improving Multimodal Multilingual Embeddings via High-quality Synthetic Data | Feb 12, 2025 | cross-modal alignmentLarge Language Model | CodeCode Available | 2 |
| Parameter-Inverted Image Pyramid Networks for Visual Perception and Multimodal Understanding | Jan 14, 2025 | image-classificationImage Classification | CodeCode Available | 2 |
| LLaVA-ST: A Multimodal Large Language Model for Fine-Grained Spatial-Temporal Understanding | Jan 14, 2025 | Feature CompressionLanguage Modeling | CodeCode Available | 2 |
| ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation | Jan 11, 2025 | Chart UnderstandingCode Generation | CodeCode Available | 2 |
| Towards a Multimodal Large Language Model with Pixel-Level Insight for Biomedicine | Dec 12, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| OpenAD: Open-World Autonomous Driving Benchmark for 3D Object Detection | Nov 26, 2024 | 3D Object DetectionAutonomous Driving | CodeCode Available | 2 |
| LHRS-Bot-Nova: Improved Multimodal Large Language Model for Remote Sensing Vision-Language Interpretation | Nov 14, 2024 | Earth ObservationInstruction Following | CodeCode Available | 2 |
| StoryTeller: Improving Long Video Description through Global Audio-Visual Character Identification | Nov 11, 2024 | Large Language ModelMultimodal Large Language Model | CodeCode Available | 2 |
| Protecting Privacy in Multimodal Large Language Models with MLLMU-Bench | Oct 29, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| SPA-Bench: A Comprehensive Benchmark for SmartPhone Agent Evaluation | Oct 19, 2024 | AI AgentBenchmarking | CodeCode Available | 2 |
| PDF-WuKong: A Large Multimodal Model for Efficient Long PDF Reading with End-to-End Sparse Sampling | Oct 8, 2024 | document understandingLanguage Modeling | CodeCode Available | 2 |
| One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos | Sep 29, 2024 | AllImage Segmentation | CodeCode Available | 2 |
| CLIP-MoE: Towards Building Mixture of Experts for CLIP with Diversified Multiplet Upcycling | Sep 28, 2024 | image-classificationImage Classification | CodeCode Available | 2 |
| ChemVLM: Exploring the Power of Multimodal Large Language Models in Chemistry Area | Aug 14, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| MedTsLLM: Leveraging LLMs for Multimodal Medical Time Series Analysis | Aug 14, 2024 | Anomaly DetectionBoundary Detection | CodeCode Available | 2 |
| T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation | Jul 19, 2024 | AttributeLanguage Modeling | CodeCode Available | 2 |