| Mementos: A Comprehensive Benchmark for Multimodal Large Language Model Reasoning over Image Sequences | Jan 19, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learning | Jan 19, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| CoDi-2: In-Context Interleaved and Interactive Any-to-Any Generation | Jan 1, 2024 | Image GenerationLanguage Modeling | —Unverified | 0 |
| LION: Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge | Jan 1, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| AllSpark: A Multimodal Spatio-Temporal General Intelligence Model with Ten Modalities via Language as a Reference Framework | Dec 31, 2023 | Large Language ModelMultimodal Large Language Model | CodeCode Available | 1 |
| TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones | Dec 28, 2023 | Computational EfficiencyImage Captioning | CodeCode Available | 3 |
| ManipLLM: Embodied Multimodal Large Language Model for Object-Centric Robotic Manipulation | Dec 24, 2023 | Common Sense ReasoningLanguage Modeling | —Unverified | 0 |
| StarVector: Generating Scalable Vector Graphics Code from Images and Text | Dec 17, 2023 | Code GenerationLanguage Modeling | CodeCode Available | 5 |
| Hallucination Augmented Contrastive Learning for Multimodal Large Language Model | Dec 12, 2023 | Contrastive LearningHallucination | CodeCode Available | 1 |
| Audio-Visual LLM for Video Understanding | Dec 11, 2023 | AudioCapsLanguage Modeling | —Unverified | 0 |