| DiffuVST: Narrating Fictional Scenes with Global-History-Guided Denoising Models | Dec 12, 2023 | DenoisingDiversity | —Unverified | 0 |
| DIR: Retrieval-Augmented Image Captioning with Comprehensive Understanding | Dec 2, 2024 | Caption GenerationDomain Generalization | —Unverified | 0 |
| Discovering Bugs in Vision Models using Off-the-shelf Image Generation and Captioning | Aug 18, 2022 | Image GenerationImage to text | —Unverified | 0 |
| Doc2Im: document to image conversion through self-attentive embedding | Nov 8, 2018 | Document To Image Conversiondocument understanding | —Unverified | 0 |
| DOCCI: Descriptions of Connected and Contrasting Images | Apr 30, 2024 | Image GenerationImage to text | —Unverified | 0 |
| Do DALL-E and Flamingo Understand Each Other? | Dec 23, 2022 | Image CaptioningImage Generation | —Unverified | 0 |
| Do LLMs Understand Visual Anomalies? Uncovering LLM's Capabilities in Zero-shot Anomaly Detection | Apr 15, 2024 | Anomaly DetectionAnomaly Localization | —Unverified | 0 |
| Dynamic Traceback Learning for Medical Report Generation | Jan 24, 2024 | Image to textMedical Report Generation | —Unverified | 0 |
| Efficient End-to-End Visual Document Understanding with Rationale Distillation | Nov 16, 2023 | document understandingImage to text | —Unverified | 0 |
| EI-CLIP: Entity-Aware Interventional Contrastive Learning for E-Commerce Cross-Modal Retrieval | Jan 1, 2022 | Causal InferenceContrastive Learning | —Unverified | 0 |