| Benchmarking Large Multimodal Models against Common Corruptions | Jan 22, 2024 | BenchmarkingImage to text | CodeCode Available | 1 | 5 |
| FETA: Towards Specializing Foundation Models for Expert Task Applications | Sep 8, 2022 | Domain GeneralizationFew-Shot Learning | CodeCode Available | 1 | 5 |
| Language Quantized AutoEncoders: Towards Unsupervised Text-Image Alignment | Feb 2, 2023 | AttributeFew-Shot Image Classification | CodeCode Available | 1 | 5 |
| Efficient Medical Vision-Language Alignment Through Adapting Masked Vision Models | Jun 10, 2025 | Contrastive LearningImage-text matching | CodeCode Available | 1 | 5 |
| Improving Image Restoration through Removing Degradations in Textual Representations | Dec 28, 2023 | DeblurringDenoising | CodeCode Available | 1 | 5 |
| Cephalo: Multi-Modal Vision-Language Models for Bio-Inspired Materials Analysis and Design | May 29, 2024 | Dataset GenerationImage to text | CodeCode Available | 1 | 5 |
| Distilled Dual-Encoder Model for Vision-Language Understanding | Dec 16, 2021 | Image to textmodel | CodeCode Available | 1 | 5 |
| DoraCycle: Domain-Oriented Adaptation of Unified Generative Model in Multimodal Cycles | Mar 5, 2025 | Domain AdaptationImage to text | CodeCode Available | 1 | 5 |
| CMC-Bench: Towards a New Paradigm of Visual Signal Compression | Jun 13, 2024 | Image CompressionImage to text | CodeCode Available | 1 | 5 |
| Can MLLMs Perform Text-to-Image In-Context Learning? | Feb 2, 2024 | Image GenerationImage to text | CodeCode Available | 1 | 5 |