| Comics for Everyone: Generating Accessible Text Descriptions for Comic Strips | Oct 1, 2023 | Language ModelingLanguage Modelling | —Unverified | 0 |
| CoT-lized Diffusion: Let's Reinforce T2I Generation Step-by-step | Jul 6, 2025 | DenoisingLarge Language Model | —Unverified | 0 |
| CUE-M: Contextual Understanding and Enhanced Search with Multimodal Large Language Model | Nov 19, 2024 | Information RetrievalLanguage Modeling | —Unverified | 0 |
| Dallah: A Dialect-Aware Multimodal Large Language Model for Arabic | Jul 25, 2024 | Image to textLanguage Modeling | —Unverified | 0 |
| Decoding Style: Efficient Fine-Tuning of LLMs for Image-Guided Outfit Recommendation with Preference | Sep 18, 2024 | Image CaptioningLarge Language Model | —Unverified | 0 |
| DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation | Dec 10, 2024 | Image GenerationLanguage Modelling | —Unverified | 0 |
| Distraction is All You Need for Multimodal Large Language Model Jailbreaking | Feb 15, 2025 | AllLanguage Modeling | —Unverified | 0 |
| DPDEdit: Detail-Preserved Diffusion Models for Multimodal Fashion Image Editing | Sep 2, 2024 | Image GenerationLanguage Modelling | —Unverified | 0 |
| DreamJourney: Perpetual View Generation with Video Diffusion Models | Jun 21, 2025 | Image to 3DLarge Language Model | —Unverified | 0 |
| DynamicControl: Adaptive Condition Selection for Improved Text-to-Image Generation | Dec 4, 2024 | Image GenerationLarge Language Model | —Unverified | 0 |