| Retrieval-Augmented Multimodal Language Modeling | Nov 22, 2022 | Caption GenerationImage Captioning | —Unverified | 0 |
| Versatile Diffusion: Text, Images and Variations All in One Diffusion Model | Nov 15, 2022 | AllDisentanglement | CodeCode Available | 6 |
| Safe Latent Diffusion: Mitigating Inappropriate Degeneration in Diffusion Models | Nov 9, 2022 | Image GenerationImage to text | CodeCode Available | 1 |
| Learning by Hallucinating: Vision-Language Pre-training with Weak Supervision | Oct 24, 2022 | cross-modal alignmentCross-Modal Retrieval | —Unverified | 0 |
| Improving the Factual Correctness of Radiology Report Generation with Semantic Rewards | Oct 21, 2022 | Image to textnamed-entity-recognition | —Unverified | 0 |
| Visual Spatial Description: Controlled Spatial-Oriented Image-to-Text Generation | Oct 20, 2022 | DecoderImage Captioning | CodeCode Available | 1 |
| Image Semantic Relation Generation | Oct 19, 2022 | Image RetrievalImage Segmentation | —Unverified | 0 |
| Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding | Oct 7, 2022 | Chart Question AnsweringDiversity | CodeCode Available | 2 |
| Cross-modal Contrastive Attention Model for Medical Report Generation | Oct 1, 2022 | Image to textMedical Report Generation | —Unverified | 0 |
| Linearly Mapping from Image to Text Space | Sep 30, 2022 | Image CaptioningImage to text | CodeCode Available | 1 |