| Dimple: Discrete Diffusion Multimodal Large Language Model with Parallel Decoding | May 22, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast | Feb 13, 2024 | Language ModellingLarge Language Model | CodeCode Available | 2 | 5 |
| Jailbreaking Attack against Multimodal Large Language Model | Feb 4, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| OpenAD: Open-World Autonomous Driving Benchmark for 3D Object Detection | Nov 26, 2024 | 3D Object DetectionAutonomous Driving | CodeCode Available | 2 | 5 |
| CLIP-MoE: Towards Building Mixture of Experts for CLIP with Diversified Multiplet Upcycling | Sep 28, 2024 | image-classificationImage Classification | CodeCode Available | 2 | 5 |
| Explore the Limits of Omni-modal Pretraining at Scale | Jun 13, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| MedTsLLM: Leveraging LLMs for Multimodal Medical Time Series Analysis | Aug 14, 2024 | Anomaly DetectionBoundary Detection | CodeCode Available | 2 | 5 |
| ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation | Jan 11, 2025 | Chart UnderstandingCode Generation | CodeCode Available | 2 | 5 |
| GeReA: Question-Aware Prompt Captions for Knowledge-based Visual Question Answering | Feb 4, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| LLMGA: Multimodal Large Language Model based Generation Assistant | Nov 27, 2023 | Image GenerationLanguage Modeling | CodeCode Available | 2 | 5 |