| VIP: Versatile Image Outpainting Empowered by Multimodal Large Language Model | Jun 3, 2024 | Image OutpaintingLanguage Modeling | CodeCode Available | 1 |
| Ovis: Structural Embedding Alignment for Multimodal Large Language Model | May 31, 2024 | Language ModelingMultimodal Large Language Model | CodeCode Available | 5 |
| Efficient Indirect LLM Jailbreak via Multimodal-LLM Jailbreak | May 30, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Voice Jailbreak Attacks Against GPT-4o | May 29, 2024 | Language ModellingLarge Language Model | CodeCode Available | 1 |
| Visual Anchors Are Strong Information Aggregators For Multimodal Large Language Model | May 28, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| Self-Corrected Multimodal Large Language Model for End-to-End Robot Manipulation | May 27, 2024 | Instruction FollowingLanguage Modeling | —Unverified | 0 |
| A Survey of Multimodal Large Language Model from A Data-centric Perspective | May 26, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| V-Zen: Efficient GUI Understanding and Precise Grounding With A Novel Multimodal LLM | May 24, 2024 | Language ModellingLarge Language Model | CodeCode Available | 0 |
| AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability | May 23, 2024 | cross-modal alignmentLanguage Modelling | —Unverified | 0 |
| From Text to Pixel: Advancing Long-Context Understanding in MLLMs | May 23, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 |