| MagicQuill: An Intelligent Interactive Image Editing System | Nov 14, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 7 |
| VITA: Towards Open-Source Interactive Omni Multimodal LLM | Aug 9, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 7 |
| Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding | May 14, 2024 | Image GenerationLanguage Modeling | CodeCode Available | 7 |
| VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks | Jun 12, 2024 | Image GenerationLanguage Modeling | CodeCode Available | 5 |
| StarVector: Generating Scalable Vector Graphics Code from Images and Text | Dec 17, 2023 | Code GenerationLanguage Modeling | CodeCode Available | 5 |
| Ferret: Refer and Ground Anything Anywhere at Any Granularity | Oct 11, 2023 | HallucinationLanguage Modeling | CodeCode Available | 5 |
| R1-Omni: Explainable Omni-Multimodal Emotion Recognition with Reinforcement Learning | Mar 7, 2025 | Emotion RecognitionLanguage Modeling | CodeCode Available | 5 |
| ThinkSound: Chain-of-Thought Reasoning in Multimodal Large Language Models for Audio Generation and Editing | Jun 26, 2025 | Audio GenerationLarge Language Model | CodeCode Available | 5 |
| Ovis: Structural Embedding Alignment for Multimodal Large Language Model | May 31, 2024 | Language ModelingMultimodal Large Language Model | CodeCode Available | 5 |
| Liquid: Language Models are Scalable Multi-modal Generators | Dec 5, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 4 |