| MobileVLM V2: Faster and Stronger Baseline for Vision Language Model | Feb 6, 2024 | AutoMLLanguage Modeling | CodeCode Available | 5 |
| Ovis: Structural Embedding Alignment for Multimodal Large Language Model | May 31, 2024 | Language ModelingMultimodal Large Language Model | CodeCode Available | 5 |
| InstructPix2Pix: Learning to Follow Image Editing Instructions | Nov 17, 2022 | Image Editing | CodeCode Available | 5 |
| NotaGen: Advancing Musicality in Symbolic Music Generation with Large Language Model Training Paradigms | Feb 25, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 5 |
| Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities | Feb 2, 2024 | Acoustic Scene ClassificationAudio captioning | CodeCode Available | 5 |
| HealthGPT: A Medical Large Vision-Language Model for Unifying Comprehension and Generation via Heterogeneous Knowledge Adaptation | Feb 14, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 5 |
| CogVLM: Visual Expert for Pretrained Language Models | Nov 6, 2023 | 1 Image, 2*2 StitchingFS-MEVQA | CodeCode Available | 5 |
| MEIA: Multimodal Embodied Perception and Interaction in Unknown Environments | Feb 1, 2024 | Embodied Question AnsweringLanguage Modeling | CodeCode Available | 5 |
| FlexLLM: A System for Co-Serving Large Language Model Inference and Parameter-Efficient Finetuning | Feb 29, 2024 | GPULanguage Modeling | CodeCode Available | 5 |
| MING-MOE: Enhancing Medical Multi-Task Learning in Large Language Models with Sparse Mixture of Low-Rank Adapter Experts | Apr 13, 2024 | DiversityLanguage Modeling | CodeCode Available | 5 |