| J-EDI QA: Benchmark for deep-sea organism-specific multimodal LLM | Dec 20, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Multimodal Hypothetical Summary for Retrieval-based Multi-image Question Answering | Dec 19, 2024 | Contrastive LearningLanguage Modeling | CodeCode Available | 0 |
| Make Imagination Clearer! Stable Diffusion-based Visual Imagination for Multimodal Machine Translation | Dec 17, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| A Survey of Mathematical Reasoning in the Era of Multimodal Large Language Model: Benchmark, Method & Challenges | Dec 16, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| IDEA-Bench: How Far are Generative Models from Professional Designing? | Dec 16, 2024 | Large Language ModelMultimodal Large Language Model | CodeCode Available | 1 |
| MERaLiON-SpeechEncoder: Towards a Speech Foundation Model for Singapore and Beyond | Dec 16, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM | Dec 12, 2024 | Image ComprehensionImage Generation | —Unverified | 0 |
| Towards a Multimodal Large Language Model with Pixel-Level Insight for Biomedicine | Dec 12, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| COEF-VQ: Cost-Efficient Video Quality Understanding through a Cascaded Multimodal LLM Framework | Dec 11, 2024 | GPULanguage Modeling | —Unverified | 0 |
| DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation | Dec 10, 2024 | Image GenerationLanguage Modelling | —Unverified | 0 |