| MultiQG-TI: Towards Question Generation from Multi-modal Sources | Jul 7, 2023 | Image to textOptical Character Recognition | CodeCode Available | 0 |
| Multi-modality Regional Alignment Network for Covid X-Ray Survival Prediction and Report Generation | May 23, 2024 | Image to textSentence | CodeCode Available | 0 |
| Face2Text: Collecting an Annotated Image Description Corpus for the Generation of Rich Face Descriptions | Mar 10, 2018 | Image DescriptionImage to text | CodeCode Available | 0 |
| Self-Supervised Image-to-Text and Text-to-Image Synthesis | Dec 9, 2021 | Image GenerationImage to text | CodeCode Available | 0 |
| Multi-LLM Collaborative Caption Generation in Scientific Documents | Jan 5, 2025 | Caption GenerationImage to text | CodeCode Available | 0 |
| BiVLC: Extending Vision-Language Compositionality Evaluation with Text-to-Image Retrieval | Jun 14, 2024 | Image RetrievalImage to text | CodeCode Available | 0 |
| Exploration into Translation-Equivariant Image Quantization | Dec 1, 2021 | Image GenerationImage to text | CodeCode Available | 0 |
| Zero-shot Nuclei Detection via Visual-Language Pre-trained Models | Jun 30, 2023 | Image to textobject-detection | CodeCode Available | 0 |
| VISLA Benchmark: Evaluating Embedding Sensitivity to Semantic and Lexical Alterations | Apr 25, 2024 | Image to textSensitivity | CodeCode Available | 0 |
| A Data-Driven Guided Decoding Mechanism for Diagnostic Captioning | Jun 20, 2024 | DiagnosticImage to text | CodeCode Available | 0 |