Image to text

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 51–75 of 246 papers

Title	Date	Tasks	Status	Hype
Multimodal Procedural Planning via Dual Text-Image Prompting	May 2, 2023	Image GenerationImage to text	CodeCode Available	1
MAGVLT: Masked Generative Vision-and-Language Transformer	Mar 21, 2023	Image CaptioningImage Generation	CodeCode Available	1
ZeroNLG: Aligning and Autoencoding Domains for Zero-Shot Multimodal and Multilingual Natural Language Generation	Mar 11, 2023	Image CaptioningImage to text	CodeCode Available	1
Towards Unifying Medical Vision-and-Language Pre-training via Soft Prompts	Feb 17, 2023	Image RetrievalImage-text Classification	CodeCode Available	1
Language Quantized AutoEncoders: Towards Unsupervised Text-Image Alignment	Feb 2, 2023	AttributeFew-Shot Image Classification	CodeCode Available	1
Safe Latent Diffusion: Mitigating Inappropriate Degeneration in Diffusion Models	Nov 9, 2022	Image GenerationImage to text	CodeCode Available	1
Visual Spatial Description: Controlled Spatial-Oriented Image-to-Text Generation	Oct 20, 2022	DecoderImage Captioning	CodeCode Available	1
Linearly Mapping from Image to Text Space	Sep 30, 2022	Image CaptioningImage to text	CodeCode Available	1
FETA: Towards Specializing Foundation Models for Expert Task Applications	Sep 8, 2022	Domain GeneralizationFew-Shot Learning	CodeCode Available	1
What is Where by Looking: Weakly-Supervised Open-World Phrase-Grounding without Text Inputs	Jun 19, 2022	BenchmarkingImage Captioning	CodeCode Available	1
Write and Paint: Generative Vision-Language Models are Unified Modal Learners	Jun 15, 2022	Image GenerationImage to text	CodeCode Available	1
ERNIE-ViLG: Unified Generative Pre-training for Bidirectional Vision-Language Generation	Dec 31, 2021	Image CaptioningImage Generation	CodeCode Available	1
Distilled Dual-Encoder Model for Vision-Language Understanding	Dec 16, 2021	Image to textmodel	CodeCode Available	1
ZeroCap: Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic	Nov 29, 2021	Contrastive LearningDescriptive	CodeCode Available	1
L-Verse: Bidirectional Generation Between Image and Text	Nov 22, 2021	Image CaptioningImage Generation	CodeCode Available	1
Unifying Multimodal Transformer for Bi-directional Image and Text Generation	Oct 19, 2021	Image GenerationImage to text	CodeCode Available	1
Concadia: Towards Image-Based Text Generation with a Purpose	Apr 16, 2021	Image CaptioningImage to text	CodeCode Available	1
Progressive Transformer-Based Generation of Radiology Reports	Feb 19, 2021	Image to textText Generation	CodeCode Available	1
Improving Factual Completeness and Consistency of Image-to-Text Radiology Report Generation	Oct 20, 2020	Image to textNatural Language Inference	CodeCode Available	1
Improving Medical Visual Representation Learning with Pathological-level Cross-Modal Alignment and Correlation Exploration	Jun 12, 2025	cross-modal alignmentImage to text	—Unverified	0
ChartReasoner: Code-Driven Modality Bridging for Long-Chain Reasoning in Chart Question Answering	Jun 11, 2025	Chart Question AnsweringImage to text	—Unverified	0
TNG-CLIP:Training-Time Negation Data Generation for Negation Awareness of CLIP	May 24, 2025	Image CaptioningImage Generation	—Unverified	0
BRIT: Bidirectional Retrieval over Unified Image-Text Graph	May 24, 2025	Image to textQuestion Answering	—Unverified	0
Robustifying Vision-Language Models via Dynamic Token Reweighting	May 22, 2025	Image to text	—Unverified	0
UniMoCo: Unified Modality Completion for Robust Multi-Modal Embeddings	May 17, 2025	Image to textInformation Retrieval	CodeCode Available	0

Show:10 25 50

← PrevPage 3 of 10Next →

No leaderboard results yet.