| Multi-modal Instruction Tuned LLMs with Fine-grained Visual Perception | Mar 5, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| MobA: Multifaceted Memory-Enhanced Adaptive Planning for Efficient Mobile Task Automation | Oct 17, 2024 | Decision MakingLanguage Modeling | CodeCode Available | 1 |
| MMNeuron: Discovering Neuron-Level Domain-Specific Interpretation in Multimodal Large Language Model | Jun 17, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| GeoLLaVA-8K: Scaling Remote-Sensing Multimodal Large Language Models to 8K Resolution | May 27, 2025 | 8kAvg | CodeCode Available | 1 |
| 3UR-LLM: An End-to-End Multimodal Large Language Model for 3D Scene Understanding | Jan 14, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| MultiMath: Bridging Visual and Mathematical Reasoning for Large Language Models | Aug 30, 2024 | Image CaptioningLanguage Modeling | CodeCode Available | 1 |
| Notes-guided MLLM Reasoning: Enhancing MLLM with Knowledge and Visual Notes for Visual Question Answering | Jan 1, 2025 | Large Language ModelMultimodal Large Language Model | CodeCode Available | 1 |
| AnomalyR1: A GRPO-based End-to-end MLLM for Industrial Anomaly Detection | Apr 16, 2025 | Anomaly DetectionLarge Language Model | CodeCode Available | 1 |
| MedTVT-R1: A Multimodal LLM Empowering Medical Reasoning and Diagnosis | Jun 23, 2025 | DiagnosticLarge Language Model | CodeCode Available | 1 |
| FinVis-GPT: A Multimodal Large Language Model for Financial Chart Analysis | Jul 31, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 1 |