Image to text

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 51–100 of 246 papers

Title	Date	Tasks	Status	Hype
Multimodal Procedural Planning via Dual Text-Image Prompting	May 2, 2023	Image GenerationImage to text	CodeCode Available	1
MAGVLT: Masked Generative Vision-and-Language Transformer	Mar 21, 2023	Image CaptioningImage Generation	CodeCode Available	1
ZeroNLG: Aligning and Autoencoding Domains for Zero-Shot Multimodal and Multilingual Natural Language Generation	Mar 11, 2023	Image CaptioningImage to text	CodeCode Available	1
Towards Unifying Medical Vision-and-Language Pre-training via Soft Prompts	Feb 17, 2023	Image RetrievalImage-text Classification	CodeCode Available	1
Language Quantized AutoEncoders: Towards Unsupervised Text-Image Alignment	Feb 2, 2023	AttributeFew-Shot Image Classification	CodeCode Available	1
Safe Latent Diffusion: Mitigating Inappropriate Degeneration in Diffusion Models	Nov 9, 2022	Image GenerationImage to text	CodeCode Available	1
Visual Spatial Description: Controlled Spatial-Oriented Image-to-Text Generation	Oct 20, 2022	DecoderImage Captioning	CodeCode Available	1
Linearly Mapping from Image to Text Space	Sep 30, 2022	Image CaptioningImage to text	CodeCode Available	1
FETA: Towards Specializing Foundation Models for Expert Task Applications	Sep 8, 2022	Domain GeneralizationFew-Shot Learning	CodeCode Available	1
What is Where by Looking: Weakly-Supervised Open-World Phrase-Grounding without Text Inputs	Jun 19, 2022	BenchmarkingImage Captioning	CodeCode Available	1
Write and Paint: Generative Vision-Language Models are Unified Modal Learners	Jun 15, 2022	Image GenerationImage to text	CodeCode Available	1
ERNIE-ViLG: Unified Generative Pre-training for Bidirectional Vision-Language Generation	Dec 31, 2021	Image CaptioningImage Generation	CodeCode Available	1
Distilled Dual-Encoder Model for Vision-Language Understanding	Dec 16, 2021	Image to textmodel	CodeCode Available	1
ZeroCap: Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic	Nov 29, 2021	Contrastive LearningDescriptive	CodeCode Available	1
L-Verse: Bidirectional Generation Between Image and Text	Nov 22, 2021	Image CaptioningImage Generation	CodeCode Available	1
Unifying Multimodal Transformer for Bi-directional Image and Text Generation	Oct 19, 2021	Image GenerationImage to text	CodeCode Available	1
Concadia: Towards Image-Based Text Generation with a Purpose	Apr 16, 2021	Image CaptioningImage to text	CodeCode Available	1
Progressive Transformer-Based Generation of Radiology Reports	Feb 19, 2021	Image to textText Generation	CodeCode Available	1
Improving Factual Completeness and Consistency of Image-to-Text Radiology Report Generation	Oct 20, 2020	Image to textNatural Language Inference	CodeCode Available	1
Improving Medical Visual Representation Learning with Pathological-level Cross-Modal Alignment and Correlation Exploration	Jun 12, 2025	cross-modal alignmentImage to text	—Unverified	0
ChartReasoner: Code-Driven Modality Bridging for Long-Chain Reasoning in Chart Question Answering	Jun 11, 2025	Chart Question AnsweringImage to text	—Unverified	0
TNG-CLIP:Training-Time Negation Data Generation for Negation Awareness of CLIP	May 24, 2025	Image CaptioningImage Generation	—Unverified	0
BRIT: Bidirectional Retrieval over Unified Image-Text Graph	May 24, 2025	Image to textQuestion Answering	—Unverified	0
Robustifying Vision-Language Models via Dynamic Token Reweighting	May 22, 2025	Image to text	—Unverified	0
UniMoCo: Unified Modality Completion for Robust Multi-Modal Embeddings	May 17, 2025	Image to textInformation Retrieval	CodeCode Available	0
Towards Cross-modal Retrieval in Chinese Cultural Heritage Documents: Dataset and Solution	May 16, 2025	Cross-Modal RetrievalImage to text	—Unverified	0
X-Fusion: Introducing New Modality to Frozen Large Language Models	Apr 29, 2025	Image to text	—Unverified	0
SemCORE: A Semantic-Enhanced Generative Cross-Modal Retrieval Framework with MLLMs	Apr 17, 2025	Cross-Modal RetrievalImage Retrieval	—Unverified	0
DART: Disease-aware Image-Text Alignment and Self-correcting Re-alignment for Trustworthy Radiology Report Generation	Apr 16, 2025	Contrastive LearningImage to text	—Unverified	0
TMCIR: Token Merge Benefits Composed Image Retrieval	Apr 15, 2025	Contrastive Learningcross-modal alignment	—Unverified	0
Image-to-Text for Medical Reports Using Adaptive Co-Attention and Triple-LSTM Module	Mar 24, 2025	Image to textMedical Report Generation	—Unverified	0
Natural Language Generation	Mar 20, 2025	Image CaptioningImage to text	—Unverified	0
PromptHash: Affinity-Prompted Collaborative Cross-Modal Learning for Adaptive Hashing Retrieval	Mar 20, 2025	Contrastive LearningCross-Modal Retrieval	CodeCode Available	0
Real-world validation of a multimodal LLM-powered pipeline for High-Accuracy Clinical Trial Patient Matching leveraging EHR data	Mar 19, 2025	Image to text	CodeCode Available	0
MFP-CLIP: Exploring the Efficacy of Multi-Form Prompts for Zero-Shot Industrial Anomaly Detection	Mar 17, 2025	Anomaly DetectionForm	—Unverified	0
ABC: Achieving Better Control of Multimodal Embeddings using VLMs	Mar 1, 2025	Image to textImage-to-Text Retrieval	—Unverified	0
On the Importance of Text Preprocessing for Multimodal Representation Learning and Pathology Report Generation	Feb 26, 2025	Cross-Modal RetrievalHallucination	—Unverified	0
Natural Language Generation from Visual Sequences: Challenges and Future Directions	Feb 18, 2025	Image to textText Generation	—Unverified	0
Reading the unreadable: Creating a dataset of 19th century English newspapers using image-to-text language models	Feb 18, 2025	Image to textOptical Character Recognition	CodeCode Available	0
UNITE-FND: Reframing Multimodal Fake News Detection through Unimodal Scene Translation	Feb 16, 2025	Binary ClassificationFake News Detection	—Unverified	0
Multi-LLM Collaborative Caption Generation in Scientific Documents	Jan 5, 2025	Caption GenerationImage to text	CodeCode Available	0
Advancing Myopia To Holism: Fully Contrastive Language-Image Pre-training	Jan 1, 2025	Image-text RetrievalImage to text	—Unverified	0
Retaining Knowledge and Enhancing Long-Text Representations in CLIP through Dual-Teacher Distillation	Jan 1, 2025	image-classificationImage Classification	—Unverified	0
PromptHash:Affinity-Prompted Collaborative Cross-Modal Learning for Adaptive Hashing Retrieval	Jan 1, 2025	Contrastive LearningImage Retrieval	CodeCode Available	0
Survey on Abstractive Text Summarization: Dataset, Models, and Metrics	Dec 22, 2024	Abstractive Text SummarizationGeneral Knowledge	CodeCode Available	0
CLIP-FSAC++: Few-Shot Anomaly Classification with Anomaly Descriptor Based on CLIP	Dec 5, 2024	Anomaly ClassificationAnomaly Detection	CodeCode Available	0
DIR: Retrieval-Augmented Image Captioning with Comprehensive Understanding	Dec 2, 2024	Caption GenerationDomain Generalization	—Unverified	0
Improving Factuality of 3D Brain MRI Report Generation with Paired Image-domain Retrieval and Text-domain Augmentation	Nov 23, 2024	Cross-Modal RetrievalImage to text	—Unverified	0
Everything is a Video: Unifying Modalities through Next-Frame Prediction	Nov 15, 2024	Caption GenerationCross-Modal Retrieval	—Unverified	0
Image2Text2Image: A Novel Framework for Label-Free Evaluation of Image-to-Text Generation with Text-to-Image Diffusion Models	Nov 8, 2024	Image CaptioningImage Generation	—Unverified	0

Show:10 25 50

← PrevPage 2 of 5Next →

No leaderboard results yet.