| LDRE: LLM-based Divergent Reasoning and Ensemble for Zero-Shot Composed Image Retrieval | Jul 11, 2024 | Image RetrievalImage to text | CodeCode Available | 2 |
| Improving Image Restoration through Removing Degradations in Textual Representations | Dec 28, 2023 | DeblurringDenoising | CodeCode Available | 1 |
| Improving Factual Completeness and Consistency of Image-to-Text Radiology Report Generation | Oct 20, 2020 | Image to textNatural Language Inference | CodeCode Available | 1 |
| Generalizing from SIMPLE to HARD Visual Reasoning: Can We Mitigate Modality Imbalance in VLMs? | Jan 5, 2025 | Image CaptioningImage to text | CodeCode Available | 1 |
| Beyond One-to-One: Rethinking the Referring Image Segmentation | Aug 26, 2023 | DecoderImage Segmentation | CodeCode Available | 1 |
| FLAME: Frozen Large Language Models Enable Data-Efficient Language-Image Pre-training | Nov 18, 2024 | Data AugmentationImage to text | CodeCode Available | 1 |
| LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-Text Generation? | Apr 16, 2024 | Image CaptioningImage Generation | CodeCode Available | 1 |
| ERNIE-ViLG: Unified Generative Pre-training for Bidirectional Vision-Language Generation | Dec 31, 2021 | Image CaptioningImage Generation | CodeCode Available | 1 |
| Bootstrapping Vision-Language Learning with Decoupled Language Pre-training | Jul 13, 2023 | Image to text | CodeCode Available | 1 |
| Efficient Medical Vision-Language Alignment Through Adapting Masked Vision Models | Jun 10, 2025 | Contrastive LearningImage-text matching | CodeCode Available | 1 |