| EmojiGAN: learning emojis distributions with a generative model | Oct 1, 2018 | Image CaptioningImage to text | —Unverified | 0 |
| Enhancing Vision-Language Pre-training with Rich Supervisions | Mar 5, 2024 | Image to textTable Detection | —Unverified | 0 |
| Evaluating authenticity and quality of image captions via sentiment and semantic analyses | Sep 14, 2024 | Image CaptioningImage to text | —Unverified | 0 |
| Every picture tells a story: Image-grounded controllable stylistic story generation | Sep 4, 2022 | Image CaptioningImage to text | —Unverified | 0 |
| Everything is a Video: Unifying Modalities through Next-Frame Prediction | Nov 15, 2024 | Caption GenerationCross-Modal Retrieval | —Unverified | 0 |
| Eyes Closed, Safety On: Protecting Multimodal LLMs via Image-to-Text Transformation | Mar 14, 2024 | Image to textOptical Character Recognition (OCR) | —Unverified | 0 |
| Faithful Chart Summarization with ChaTS-Pi | May 29, 2024 | Image to textSentence | —Unverified | 0 |
| Fetch-A-Set: A Large-Scale OCR-Free Benchmark for Historical Document Retrieval | Jun 11, 2024 | Image RetrievalImage to text | —Unverified | 0 |
| From Image to Text Classification: A Novel Approach based on Clustering Word Embeddings | Jul 25, 2017 | ClusteringGeneral Classification | —Unverified | 0 |
| From Image to Text in Sentiment Analysis via Regression and Deep Learning | Sep 1, 2019 | Image to textregression | —Unverified | 0 |
| From Pixels to Prose: Advancing Multi-Modal Language Models for Remote Sensing | Nov 5, 2024 | Change DetectionContrastive Learning | —Unverified | 0 |
| GPC: Generative and General Pathology Image Classifier | Jul 12, 2024 | Classificationimage-classification | —Unverified | 0 |
| GPT-4V(ision) as a Generalist Evaluator for Vision-Language Tasks | Nov 2, 2023 | Image GenerationImage to text | —Unverified | 0 |
| GrowCLIP: Data-aware Automatic Model Growing for Large-scale Contrastive Language-Image Pre-training | Aug 22, 2023 | image-classificationImage Classification | —Unverified | 0 |
| Hierarchical Gumbel Attention Network for Text-based Person Search | Oct 10, 2020 | Image RetrievalImage to text | —Unverified | 0 |
| HyCIR: Boosting Zero-Shot Composed Image Retrieval with Synthetic Labels | Jul 8, 2024 | Contrastive LearningImage Retrieval | —Unverified | 0 |
| I2T2I: Learning Text to Image Synthesis with Textual Data Augmentation | Mar 20, 2017 | Caption GenerationData Augmentation | —Unverified | 0 |
| Illegible Text to Readable Text: An Image-to-Image Transformation using Conditional Sliced Wasserstein Adversarial Networks | Oct 11, 2019 | Generative Adversarial NetworkImage-to-Image Translation | —Unverified | 0 |
| Image2Text2Image: A Novel Framework for Label-Free Evaluation of Image-to-Text Generation with Text-to-Image Diffusion Models | Nov 8, 2024 | Image CaptioningImage Generation | —Unverified | 0 |
| Image Captioners Sometimes Tell More Than Images They See | May 4, 2023 | DescriptiveImage Captioning | —Unverified | 0 |
| Image Semantic Relation Generation | Oct 19, 2022 | Image RetrievalImage Segmentation | —Unverified | 0 |
| Re-ViLM: Retrieval-Augmented Visual Language Model for Zero and Few-Shot Image Captioning | Feb 9, 2023 | Few-Shot LearningImage Captioning | —Unverified | 0 |
| Revisiting DETR Pre-training for Object Detection | Aug 2, 2023 | Image to textObject | —Unverified | 0 |
| Robotic Environmental State Recognition with Pre-Trained Vision-Language Models and Black-Box Optimization | Sep 26, 2024 | Image to textImage-to-Text Retrieval | —Unverified | 0 |
| Robotic State Recognition with Image-to-Text Retrieval Task of Pre-Trained Vision-Language Model and Black-Box Optimization | Oct 30, 2024 | Image to textImage-to-Text Retrieval | —Unverified | 0 |