| Improving Image Restoration through Removing Degradations in Textual Representations | Dec 28, 2023 | DeblurringDenoising | CodeCode Available | 1 | 5 |
| Concadia: Towards Image-Based Text Generation with a Purpose | Apr 16, 2021 | Image CaptioningImage to text | CodeCode Available | 1 | 5 |
| Language-Oriented Semantic Latent Representation for Image Transmission | May 16, 2024 | Image to textSemantic Communication | CodeCode Available | 1 | 5 |
| PRIOR: Prototype Representation Joint Learning from Medical Images and Reports | Jul 24, 2023 | Contrastive LearningImage to text | CodeCode Available | 1 | 5 |
| Brain Captioning: Decoding human brain activity into images and text | May 19, 2023 | Brain DecodingDepth Estimation | CodeCode Available | 1 | 5 |
| Multimodal Foundation Models For Echocardiogram Interpretation | Aug 29, 2023 | Cross-Modal RetrievalDiagnostic | CodeCode Available | 1 | 5 |
| CMC-Bench: Towards a New Paradigm of Visual Signal Compression | Jun 13, 2024 | Image CompressionImage to text | CodeCode Available | 1 | 5 |
| Multimodal Procedural Planning via Dual Text-Image Prompting | May 2, 2023 | Image GenerationImage to text | CodeCode Available | 1 | 5 |
| MAGVLT: Masked Generative Vision-and-Language Transformer | Mar 21, 2023 | Image CaptioningImage Generation | CodeCode Available | 1 | 5 |
| L-Verse: Bidirectional Generation Between Image and Text | Nov 22, 2021 | Image CaptioningImage Generation | CodeCode Available | 1 | 5 |
| Vision-Language Dataset Distillation | Aug 15, 2023 | Dataset Distillationimage-classification | CodeCode Available | 1 | 5 |
| ObjectCompose: Evaluating Resilience of Vision-Based Models on Object-to-Background Compositional Changes | Mar 7, 2024 | Image to textObject | CodeCode Available | 1 | 5 |
| FETA: Towards Specializing Foundation Models for Expert Task Applications | Sep 8, 2022 | Domain GeneralizationFew-Shot Learning | CodeCode Available | 1 | 5 |
| Benchmarking Large Multimodal Models against Common Corruptions | Jan 22, 2024 | BenchmarkingImage to text | CodeCode Available | 1 | 5 |
| Linearly Mapping from Image to Text Space | Sep 30, 2022 | Image CaptioningImage to text | CodeCode Available | 1 | 5 |
| Cephalo: Multi-Modal Vision-Language Models for Bio-Inspired Materials Analysis and Design | May 29, 2024 | Dataset GenerationImage to text | CodeCode Available | 1 | 5 |
| FLAME: Frozen Large Language Models Enable Data-Efficient Language-Image Pre-training | Nov 18, 2024 | Data AugmentationImage to text | CodeCode Available | 1 | 5 |
| LMM4LMM: Benchmarking and Evaluating Large-multimodal Image Generation with LMMs | Apr 11, 2025 | BenchmarkingImage Generation | CodeCode Available | 1 | 5 |
| Generalizing from SIMPLE to HARD Visual Reasoning: Can We Mitigate Modality Imbalance in VLMs? | Jan 5, 2025 | Image CaptioningImage to text | CodeCode Available | 1 | 5 |
| Efficient Medical Vision-Language Alignment Through Adapting Masked Vision Models | Jun 10, 2025 | Contrastive LearningImage-text matching | CodeCode Available | 1 | 5 |
| DoraCycle: Domain-Oriented Adaptation of Unified Generative Model in Multimodal Cycles | Mar 5, 2025 | Domain AdaptationImage to text | CodeCode Available | 1 | 5 |
| Can MLLMs Perform Text-to-Image In-Context Learning? | Feb 2, 2024 | Image GenerationImage to text | CodeCode Available | 1 | 5 |
| Distilled Dual-Encoder Model for Vision-Language Understanding | Dec 16, 2021 | Image to textmodel | CodeCode Available | 1 | 5 |
| Beyond One-to-One: Rethinking the Referring Image Segmentation | Aug 26, 2023 | DecoderImage Segmentation | CodeCode Available | 1 | 5 |
| ERNIE-ViLG: Unified Generative Pre-training for Bidirectional Vision-Language Generation | Dec 31, 2021 | Image CaptioningImage Generation | CodeCode Available | 1 | 5 |