| Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models | Apr 19, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 4 |
| MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with Interleaved Visual-Textual Tokens | Apr 4, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 4 |
| SEED-Data-Edit Technical Report: A Hybrid Dataset for Instructional Image Editing | May 7, 2024 | Image ManipulationLanguage Modeling | CodeCode Available | 4 |
| Liquid: Language Models are Scalable Multi-modal Generators | Dec 5, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 4 |
| R1-Onevision:An Open-Source Multimodal Large Language Model Capable of Deep Reasoning | Feb 24, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 4 |
| Deep Learning and LLM-based Methods Applied to Stellar Lightcurve Classification | Apr 16, 2024 | Feature EngineeringLanguage Modeling | CodeCode Available | 3 |
| Remote Sensing Temporal Vision-Language Models: A Comprehensive Survey | Dec 3, 2024 | Change DetectionDescriptive | CodeCode Available | 3 |
| ShapeLLM: Universal 3D Object Understanding for Embodied Interaction | Feb 27, 2024 | 3D geometry3D Object Captioning | CodeCode Available | 3 |
| Multimodal Table Understanding | Jun 12, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 3 |
| Baichuan-Omni Technical Report | Oct 11, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 3 |