| LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-Text Generation? | Apr 16, 2024 | Image CaptioningImage Generation | CodeCode Available | 1 |
| CMC-Bench: Towards a New Paradigm of Visual Signal Compression | Jun 13, 2024 | Image CompressionImage to text | CodeCode Available | 1 |
| Improving Factual Completeness and Consistency of Image-to-Text Radiology Report Generation | Oct 20, 2020 | Image to textNatural Language Inference | CodeCode Available | 1 |
| Language-Oriented Semantic Latent Representation for Image Transmission | May 16, 2024 | Image to textSemantic Communication | CodeCode Available | 1 |
| Generalizing from SIMPLE to HARD Visual Reasoning: Can We Mitigate Modality Imbalance in VLMs? | Jan 5, 2025 | Image CaptioningImage to text | CodeCode Available | 1 |
| Cephalo: Multi-Modal Vision-Language Models for Bio-Inspired Materials Analysis and Design | May 29, 2024 | Dataset GenerationImage to text | CodeCode Available | 1 |
| FLAME: Frozen Large Language Models Enable Data-Efficient Language-Image Pre-training | Nov 18, 2024 | Data AugmentationImage to text | CodeCode Available | 1 |
| ERNIE-ViLG: Unified Generative Pre-training for Bidirectional Vision-Language Generation | Dec 31, 2021 | Image CaptioningImage Generation | CodeCode Available | 1 |
| Benchmarking Large Multimodal Models against Common Corruptions | Jan 22, 2024 | BenchmarkingImage to text | CodeCode Available | 1 |
| Efficient Medical Vision-Language Alignment Through Adapting Masked Vision Models | Jun 10, 2025 | Contrastive LearningImage-text matching | CodeCode Available | 1 |