| LMEye: An Interactive Perception Network for Large Language Models | May 5, 2023 | Language ModellingLarge Language Model | CodeCode Available | 1 | 5 |
| Caution for the Environment: Multimodal Agents are Susceptible to Environmental Distractions | Aug 5, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 | 5 |
| Distributed LLMs and Multimodal Large Language Models: A Survey on Advances, Challenges, and Future Directions | Mar 20, 2025 | 2D Object DetectionDistributed Computing | CodeCode Available | 1 | 5 |
| Kosmos-2: Grounding Multimodal Large Language Models to the World | Jun 26, 2023 | Image CaptioningIn-Context Learning | CodeCode Available | 1 | 5 |
| Towards Text-Image Interleaved Retrieval | Feb 18, 2025 | Information RetrievalLanguage Modeling | CodeCode Available | 1 | 5 |
| TextToucher: Fine-Grained Text-to-Touch Generation | Sep 9, 2024 | Language ModellingLarge Language Model | CodeCode Available | 1 | 5 |
| VIP: Versatile Image Outpainting Empowered by Multimodal Large Language Model | Jun 3, 2024 | Image OutpaintingLanguage Modeling | CodeCode Available | 1 | 5 |
| Diagnosing and Mitigating Modality Interference in Multimodal Large Language Models | May 26, 2025 | image-classificationImage Classification | CodeCode Available | 0 | 5 |
| SCA: Improve Semantic Consistent in Unrestricted Adversarial Attacks via DDPM Inversion | Oct 3, 2024 | Adversarial AttackDenoising | CodeCode Available | 0 | 5 |
| AdaptVision: Dynamic Input Scaling in MLLMs for Versatile Scene Understanding | Aug 30, 2024 | Language ModellingLarge Language Model | CodeCode Available | 0 | 5 |