| Multimodal Procedural Planning via Dual Text-Image Prompting | May 2, 2023 | Image GenerationImage to text | CodeCode Available | 1 |
| MAGVLT: Masked Generative Vision-and-Language Transformer | Mar 21, 2023 | Image CaptioningImage Generation | CodeCode Available | 1 |
| ZeroNLG: Aligning and Autoencoding Domains for Zero-Shot Multimodal and Multilingual Natural Language Generation | Mar 11, 2023 | Image CaptioningImage to text | CodeCode Available | 1 |
| Towards Unifying Medical Vision-and-Language Pre-training via Soft Prompts | Feb 17, 2023 | Image RetrievalImage-text Classification | CodeCode Available | 1 |
| Language Quantized AutoEncoders: Towards Unsupervised Text-Image Alignment | Feb 2, 2023 | AttributeFew-Shot Image Classification | CodeCode Available | 1 |
| Safe Latent Diffusion: Mitigating Inappropriate Degeneration in Diffusion Models | Nov 9, 2022 | Image GenerationImage to text | CodeCode Available | 1 |
| Visual Spatial Description: Controlled Spatial-Oriented Image-to-Text Generation | Oct 20, 2022 | DecoderImage Captioning | CodeCode Available | 1 |
| Linearly Mapping from Image to Text Space | Sep 30, 2022 | Image CaptioningImage to text | CodeCode Available | 1 |
| FETA: Towards Specializing Foundation Models for Expert Task Applications | Sep 8, 2022 | Domain GeneralizationFew-Shot Learning | CodeCode Available | 1 |
| What is Where by Looking: Weakly-Supervised Open-World Phrase-Grounding without Text Inputs | Jun 19, 2022 | BenchmarkingImage Captioning | CodeCode Available | 1 |