| MobA: Multifaceted Memory-Enhanced Adaptive Planning for Efficient Mobile Task Automation | Oct 17, 2024 | Decision MakingLanguage Modeling | CodeCode Available | 1 |
| MultiMath: Bridging Visual and Mathematical Reasoning for Large Language Models | Aug 30, 2024 | Image CaptioningLanguage Modeling | CodeCode Available | 1 |
| Multi-modal Instruction Tuned LLMs with Fine-grained Visual Perception | Mar 5, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| MiniGPT-Pancreas: Multimodal Large Language Model for Pancreas Cancer Classification and Detection | Dec 20, 2024 | Cancer ClassificationChatbot | CodeCode Available | 1 |
| MMNeuron: Discovering Neuron-Level Domain-Specific Interpretation in Multimodal Large Language Model | Jun 17, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Harnessing Multimodal Large Language Models for Multimodal Sequential Recommendation | Aug 19, 2024 | Large Language ModelMultimodal Large Language Model | CodeCode Available | 1 |
| Multimodal LLM-Guided Semantic Correction in Text-to-Image Diffusion | May 26, 2025 | DenoisingImage Generation | CodeCode Available | 1 |
| Hespi: A pipeline for automatically detecting information from hebarium specimen sheets | Oct 11, 2024 | Handwritten Text RecognitionHTR | CodeCode Available | 1 |
| Enhancing Time Series Forecasting via Multi-Level Text Alignment with LLMs | Apr 10, 2025 | Multimodal Large Language ModelTime Series | CodeCode Available | 1 |
| EndoChat: Grounded Multimodal Large Language Model for Endoscopic Surgery | Jan 20, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Meaning Typed Prompting: A Technique for Efficient, Reliable Structured Output Generation | Oct 22, 2024 | Large Language ModelMultimodal Large Language Model | CodeCode Available | 1 |
| LMEye: An Interactive Perception Network for Large Language Models | May 5, 2023 | Language ModellingLarge Language Model | CodeCode Available | 1 |
| Unifying Segment Anything in Microscopy with Multimodal Large Language Model | May 16, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| LLaVA-SpaceSGG: Visual Instruct Tuning for Open-vocabulary Scene Graph Generation with Enhanced Spatial Relations | Dec 9, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Chain of Images for Intuitively Reasoning | Nov 9, 2023 | Common Sense ReasoningLanguage Modelling | CodeCode Available | 1 |
| LLaSA: A Multimodal LLM for Human Activity Analysis Through Wearable and Smartphone Sensors | Jun 20, 2024 | 16kInstruction Following | CodeCode Available | 1 |
| MedTVT-R1: A Multimodal LLM Empowering Medical Reasoning and Diagnosis | Jun 23, 2025 | DiagnosticLarge Language Model | CodeCode Available | 1 |
| DaLPSR: Leverage Degradation-Aligned Language Prompt for Real-World Image Super-Resolution | Jun 24, 2024 | Image RestorationImage Super-Resolution | CodeCode Available | 1 |
| LION : Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge | Nov 20, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| un^2CLIP: Improving CLIP's Visual Detail Capturing Ability via Inverting unCLIP | May 30, 2025 | Large Language ModelMultimodal Large Language Model | CodeCode Available | 1 |
| AllSpark: A Multimodal Spatio-Temporal General Intelligence Model with Ten Modalities via Language as a Reference Framework | Dec 31, 2023 | Large Language ModelMultimodal Large Language Model | CodeCode Available | 1 |
| Caution for the Environment: Multimodal Agents are Susceptible to Environmental Distractions | Aug 5, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Distributed LLMs and Multimodal Large Language Models: A Survey on Advances, Challenges, and Future Directions | Mar 20, 2025 | 2D Object DetectionDistributed Computing | CodeCode Available | 1 |
| Leveraging MLLM Embeddings and Attribute Smoothing for Compositional Zero-Shot Learning | Nov 18, 2024 | AttributeCompositional Zero-Shot Learning | CodeCode Available | 1 |
| LITE: Modeling Environmental Ecosystems with Multimodal Large Language Models | Apr 1, 2024 | Decision MakingLanguage Modeling | CodeCode Available | 1 |