| EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM | Dec 12, 2024 | Image ComprehensionImage Generation | —Unverified | 0 |
| COEF-VQ: Cost-Efficient Video Quality Understanding through a Cascaded Multimodal LLM Framework | Dec 11, 2024 | GPULanguage Modeling | —Unverified | 0 |
| DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation | Dec 10, 2024 | Image GenerationLanguage Modelling | —Unverified | 0 |
| ILLUME: Illuminating Your LLMs to See, Draw, and Self-Enhance | Dec 9, 2024 | Image GenerationLanguage Modeling | —Unverified | 0 |
| Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling | Dec 6, 2024 | document understandingHallucination | —Unverified | 0 |
| EgoPlan-Bench2: A Benchmark for Multimodal Large Language Model Planning in Real-World Scenarios | Dec 5, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| EditScout: Locating Forged Regions from Diffusion-based Edited Images with Multimodal LLM | Dec 5, 2024 | Image ManipulationLanguage Modeling | —Unverified | 0 |
| DynamicControl: Adaptive Condition Selection for Improved Text-to-Image Generation | Dec 4, 2024 | Image GenerationLarge Language Model | —Unverified | 0 |
| ObjectFinder: An Open-Vocabulary Assistive System for Interactive Object Search by Blind People | Dec 4, 2024 | Large Language ModelMultimodal Large Language Model | —Unverified | 0 |
| WSI-LLaVA: A Multimodal Large Language Model for Whole Slide Image | Dec 3, 2024 | DiagnosticLanguage Modeling | —Unverified | 0 |