| DiffusionSTR: Diffusion Model for Scene Text Recognition | Jun 29, 2023 | Image to textmodel | —Unverified | 0 |
| I See Dead People: Gray-Box Adversarial Attack on Image-To-Text Models | Jun 13, 2023 | Adversarial AttackDecoder | —Unverified | 0 |
| CapText: Large Language Model-based Caption Generation From Image Context and Description | Jun 1, 2023 | Caption GenerationImage to text | —Unverified | 0 |
| Brain Captioning: Decoding human brain activity into images and text | May 19, 2023 | Brain DecodingDepth Estimation | CodeCode Available | 1 |
| What You See is What You Read? Improving Text-Image Alignment Evaluation | May 17, 2023 | Image GenerationImage to text | CodeCode Available | 1 |
| Category-Oriented Representation Learning for Image to Multi-Modal Retrieval | May 6, 2023 | Cross-Modal RetrievalImage Retrieval | —Unverified | 0 |
| Image Captioners Sometimes Tell More Than Images They See | May 4, 2023 | DescriptiveImage Captioning | —Unverified | 0 |
| Multimodal Procedural Planning via Dual Text-Image Prompting | May 2, 2023 | Image GenerationImage to text | CodeCode Available | 1 |
| Interpreting Vision and Language Generative Models with Semantic Visual Priors | Apr 28, 2023 | Image to text | —Unverified | 0 |
| RoCOCO: Robustness Benchmark of MS-COCO to Stress-test Image-Text Matching Models | Apr 21, 2023 | Cross-Modal RetrievalImage-text matching | CodeCode Available | 0 |