Image to text

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 151–200 of 246 papers

Title	Date	Tasks	Status
EmojiGAN: learning emojis distributions with a generative model	Oct 1, 2018	Image CaptioningImage to text	—Unverified
Enhancing Vision-Language Pre-training with Rich Supervisions	Mar 5, 2024	Image to textTable Detection	—Unverified
Evaluating authenticity and quality of image captions via sentiment and semantic analyses	Sep 14, 2024	Image CaptioningImage to text	—Unverified
Every picture tells a story: Image-grounded controllable stylistic story generation	Sep 4, 2022	Image CaptioningImage to text	—Unverified
Everything is a Video: Unifying Modalities through Next-Frame Prediction	Nov 15, 2024	Caption GenerationCross-Modal Retrieval	—Unverified
Eyes Closed, Safety On: Protecting Multimodal LLMs via Image-to-Text Transformation	Mar 14, 2024	Image to textOptical Character Recognition (OCR)	—Unverified
Faithful Chart Summarization with ChaTS-Pi	May 29, 2024	Image to textSentence	—Unverified
Fetch-A-Set: A Large-Scale OCR-Free Benchmark for Historical Document Retrieval	Jun 11, 2024	Image RetrievalImage to text	—Unverified
From Image to Text Classification: A Novel Approach based on Clustering Word Embeddings	Jul 25, 2017	ClusteringGeneral Classification	—Unverified
From Image to Text in Sentiment Analysis via Regression and Deep Learning	Sep 1, 2019	Image to textregression	—Unverified
From Pixels to Prose: Advancing Multi-Modal Language Models for Remote Sensing	Nov 5, 2024	Change DetectionContrastive Learning	—Unverified
GPC: Generative and General Pathology Image Classifier	Jul 12, 2024	Classificationimage-classification	—Unverified
GPT-4V(ision) as a Generalist Evaluator for Vision-Language Tasks	Nov 2, 2023	Image GenerationImage to text	—Unverified
GrowCLIP: Data-aware Automatic Model Growing for Large-scale Contrastive Language-Image Pre-training	Aug 22, 2023	image-classificationImage Classification	—Unverified
Hierarchical Gumbel Attention Network for Text-based Person Search	Oct 10, 2020	Image RetrievalImage to text	—Unverified
HyCIR: Boosting Zero-Shot Composed Image Retrieval with Synthetic Labels	Jul 8, 2024	Contrastive LearningImage Retrieval	—Unverified
I2T2I: Learning Text to Image Synthesis with Textual Data Augmentation	Mar 20, 2017	Caption GenerationData Augmentation	—Unverified
Illegible Text to Readable Text: An Image-to-Image Transformation using Conditional Sliced Wasserstein Adversarial Networks	Oct 11, 2019	Generative Adversarial NetworkImage-to-Image Translation	—Unverified
Image2Text2Image: A Novel Framework for Label-Free Evaluation of Image-to-Text Generation with Text-to-Image Diffusion Models	Nov 8, 2024	Image CaptioningImage Generation	—Unverified
Image Captioners Sometimes Tell More Than Images They See	May 4, 2023	DescriptiveImage Captioning	—Unverified
Image Semantic Relation Generation	Oct 19, 2022	Image RetrievalImage Segmentation	—Unverified
Re-ViLM: Retrieval-Augmented Visual Language Model for Zero and Few-Shot Image Captioning	Feb 9, 2023	Few-Shot LearningImage Captioning	—Unverified
Revisiting DETR Pre-training for Object Detection	Aug 2, 2023	Image to textObject	—Unverified
Robotic Environmental State Recognition with Pre-Trained Vision-Language Models and Black-Box Optimization	Sep 26, 2024	Image to textImage-to-Text Retrieval	—Unverified
Robotic State Recognition with Image-to-Text Retrieval Task of Pre-Trained Vision-Language Model and Black-Box Optimization	Oct 30, 2024	Image to textImage-to-Text Retrieval	—Unverified
Robustifying Vision-Language Models via Dynamic Token Reweighting	May 22, 2025	Image to text	—Unverified
See then Tell: Enhancing Key Information Extraction with Vision Grounding	Sep 29, 2024	Image to textKey Information Extraction	—Unverified
SemCORE: A Semantic-Enhanced Generative Cross-Modal Retrieval Framework with MLLMs	Apr 17, 2025	Cross-Modal RetrievalImage Retrieval	—Unverified
Sequential Semantic Generative Communication for Progressive Text-to-Image Generation	Sep 8, 2023	Image GenerationImage to text	—Unverified
SingleInsert: Inserting New Concepts from a Single Image into Text-to-Image Models for Flexible Editing	Oct 12, 2023	Image GenerationImage to text	—Unverified
SLAN: Self-Locator Aided Network for Cross-Modal Understanding	Nov 28, 2022	Image RetrievalImage to text	—Unverified
SLAN: Self-Locator Aided Network for Vision-Language Understanding	Jan 1, 2023	Image RetrievalImage to text	—Unverified
SRCB at SemEval-2022 Task 5: Pretraining Based Image to Text Late Sequential Fusion System for Multimodal Misogynous Meme Identification	Jul 1, 2022	Image to text	—Unverified
SurrogatePrompt: Bypassing the Safety Filter of Text-to-Image Models via Substitution	Sep 25, 2023	Image to text	—Unverified
Survey of Visual-Semantic Embedding Methods for Zero-Shot Image Retrieval	May 16, 2021	Graph GenerationImage Captioning	—Unverified
SyCoCa: Symmetrizing Contrastive Captioners with Attentive Masking for Multimodal Alignment	Jan 4, 2024	Image Captioningimage-classification	—Unverified
Synergistic Dual Spatial-aware Generation of Image-to-Text and Text-to-Image	Oct 20, 2024	Image to text	—Unverified
Synthesizing Novel Pairs of Image and Text	Dec 18, 2017	Image to text	—Unverified
Task-Oriented Multi-Modal Mutual Leaning for Vision-Language Models	Mar 30, 2023	Image to textPrompt Learning	—Unverified
TMCIR: Token Merge Benefits Composed Image Retrieval	Apr 15, 2025	Contrastive Learningcross-modal alignment	—Unverified
TNG-CLIP:Training-Time Negation Data Generation for Negation Awareness of CLIP	May 24, 2025	Image CaptioningImage Generation	—Unverified
Towards a Visual-Language Foundation Model for Computational Pathology	Jul 24, 2023	Contrastive Learningimage-classification	—Unverified
Transform-Retrieve-Generate: Natural Language-Centric Outside-Knowledge Visual Question Answering	Jan 1, 2022	Generative Question AnsweringImage to text	—Unverified
TrojVLM: Backdoor Attack Against Vision Language Models	Sep 28, 2024	Backdoor AttackImage Captioning	—Unverified
Turbo Learning for Captionbot and Drawingbot	May 21, 2018	Image CaptioningImage Generation	—Unverified
Two-stream Hierarchical Similarity Reasoning for Image-text Matching	Mar 10, 2022	Image-text matchingImage to text	—Unverified
Uncertainty-based Cross-Modal Retrieval with Probabilistic Representations	Apr 20, 2022	Cross-Modal RetrievalImage Retrieval	—Unverified
Understanding the Effect of using Semantically Meaningful Tokens for Visual Representation Learning	May 26, 2024	Image to textImage-to-Text Retrieval	—Unverified
UNITE-FND: Reframing Multimodal Fake News Detection through Unimodal Scene Translation	Feb 16, 2025	Binary ClassificationFake News Detection	—Unverified
Using Inter-Sentence Diverse Beam Search to Reduce Redundancy in Visual Storytelling	May 30, 2018	Image to textSentence	—Unverified

Show:10 25 50

← PrevPage 4 of 5Next →

No leaderboard results yet.