| TextToucher: Fine-Grained Text-to-Touch Generation | Sep 9, 2024 | Language ModellingLarge Language Model | CodeCode Available | 1 |
| MultiMath: Bridging Visual and Mathematical Reasoning for Large Language Models | Aug 30, 2024 | Image CaptioningLanguage Modeling | CodeCode Available | 1 |
| ProteinGPT: Multimodal LLM for Protein Property Prediction and Structure Understanding | Aug 21, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Harnessing Multimodal Large Language Models for Multimodal Sequential Recommendation | Aug 19, 2024 | Large Language ModelMultimodal Large Language Model | CodeCode Available | 1 |
| FFAA: Multimodal Large Language Model based Explainable Open-World Face Forgery Analysis Assistant | Aug 19, 2024 | DescriptiveFace Swapping | CodeCode Available | 1 |
| Caution for the Environment: Multimodal Agents are Susceptible to Environmental Distractions | Aug 5, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model | Jul 23, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| A Refer-and-Ground Multimodal Large Language Model for Biomedicine | Jun 26, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| DaLPSR: Leverage Degradation-Aligned Language Prompt for Real-World Image Super-Resolution | Jun 24, 2024 | Image RestorationImage Super-Resolution | CodeCode Available | 1 |
| LLaSA: A Multimodal LLM for Human Activity Analysis Through Wearable and Smartphone Sensors | Jun 20, 2024 | 16kInstruction Following | CodeCode Available | 1 |
| MMNeuron: Discovering Neuron-Level Domain-Specific Interpretation in Multimodal Large Language Model | Jun 17, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| VIP: Versatile Image Outpainting Empowered by Multimodal Large Language Model | Jun 3, 2024 | Image OutpaintingLanguage Modeling | CodeCode Available | 1 |
| Voice Jailbreak Attacks Against GPT-4o | May 29, 2024 | Language ModellingLarge Language Model | CodeCode Available | 1 |
| From Text to Pixel: Advancing Long-Context Understanding in MLLMs | May 23, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| LITE: Modeling Environmental Ecosystems with Multimodal Large Language Models | Apr 1, 2024 | Decision MakingLanguage Modeling | CodeCode Available | 1 |
| Multi-modal Instruction Tuned LLMs with Fine-grained Visual Perception | Mar 5, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Mementos: A Comprehensive Benchmark for Multimodal Large Language Model Reasoning over Image Sequences | Jan 19, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| AllSpark: A Multimodal Spatio-Temporal General Intelligence Model with Ten Modalities via Language as a Reference Framework | Dec 31, 2023 | Large Language ModelMultimodal Large Language Model | CodeCode Available | 1 |
| Hallucination Augmented Contrastive Learning for Multimodal Large Language Model | Dec 12, 2023 | Contrastive LearningHallucination | CodeCode Available | 1 |
| LION : Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge | Nov 20, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Chain of Images for Intuitively Reasoning | Nov 9, 2023 | Common Sense ReasoningLanguage Modelling | CodeCode Available | 1 |
| Multimodal ChatGPT for Medical Applications: an Experimental Study of GPT-4V | Oct 29, 2023 | DiagnosticLanguage Modeling | CodeCode Available | 1 |
| CXR-LLAVA: a multimodal large language model for interpreting chest X-ray images | Oct 22, 2023 | DiagnosticLanguage Modeling | CodeCode Available | 1 |
| UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model | Oct 8, 2023 | DecoderLanguage Modeling | CodeCode Available | 1 |
| FinVis-GPT: A Multimodal Large Language Model for Financial Chart Analysis | Jul 31, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 1 |