| CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching | Apr 4, 2024 | AttributeImage Captioning | CodeCode Available | 2 | 5 |
| Bootstrapping Vision-Language Learning with Decoupled Language Pre-training | Jul 13, 2023 | Image to text | CodeCode Available | 1 | 5 |
| ERNIE-ViLG: Unified Generative Pre-training for Bidirectional Vision-Language Generation | Dec 31, 2021 | Image CaptioningImage Generation | CodeCode Available | 1 | 5 |
| Improving Image Restoration through Removing Degradations in Textual Representations | Dec 28, 2023 | DeblurringDenoising | CodeCode Available | 1 | 5 |
| DoraCycle: Domain-Oriented Adaptation of Unified Generative Model in Multimodal Cycles | Mar 5, 2025 | Domain AdaptationImage to text | CodeCode Available | 1 | 5 |
| FETA: Towards Specializing Foundation Models for Expert Task Applications | Sep 8, 2022 | Domain GeneralizationFew-Shot Learning | CodeCode Available | 1 | 5 |
| Beyond One-to-One: Rethinking the Referring Image Segmentation | Aug 26, 2023 | DecoderImage Segmentation | CodeCode Available | 1 | 5 |
| Distilled Dual-Encoder Model for Vision-Language Understanding | Dec 16, 2021 | Image to textmodel | CodeCode Available | 1 | 5 |
| Efficient Medical Vision-Language Alignment Through Adapting Masked Vision Models | Jun 10, 2025 | Contrastive LearningImage-text matching | CodeCode Available | 1 | 5 |
| Improving Factual Completeness and Consistency of Image-to-Text Radiology Report Generation | Oct 20, 2020 | Image to textNatural Language Inference | CodeCode Available | 1 | 5 |