| Referring to Any Person | Mar 11, 2025 | Large Language ModelMultimodal Large Language Model | CodeCode Available | 2 |
| Next Token Is Enough: Realistic Image Quality and Aesthetic Scoring with Multimodal Large Language Model | Mar 8, 2025 | Image Quality AssessmentLanguage Modeling | CodeCode Available | 2 |
| Keeping Yourself is Important in Downstream Tuning Multimodal Large Language Model | Mar 6, 2025 | General KnowledgeImage Captioning | CodeCode Available | 2 |
| Introducing Visual Perception Token into Multimodal Large Language Model | Feb 24, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| mmE5: Improving Multimodal Multilingual Embeddings via High-quality Synthetic Data | Feb 12, 2025 | cross-modal alignmentLarge Language Model | CodeCode Available | 2 |
| LLaVA-ST: A Multimodal Large Language Model for Fine-Grained Spatial-Temporal Understanding | Jan 14, 2025 | Feature CompressionLanguage Modeling | CodeCode Available | 2 |
| Parameter-Inverted Image Pyramid Networks for Visual Perception and Multimodal Understanding | Jan 14, 2025 | image-classificationImage Classification | CodeCode Available | 2 |
| ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation | Jan 11, 2025 | Chart UnderstandingCode Generation | CodeCode Available | 2 |
| Towards a Multimodal Large Language Model with Pixel-Level Insight for Biomedicine | Dec 12, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| OpenAD: Open-World Autonomous Driving Benchmark for 3D Object Detection | Nov 26, 2024 | 3D Object DetectionAutonomous Driving | CodeCode Available | 2 |