Image to text

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 51–75 of 246 papers

Title	Date	Tasks	Status	Hype	Score
Language Quantized AutoEncoders: Towards Unsupervised Text-Image Alignment	Feb 2, 2023	AttributeFew-Shot Image Classification	CodeCode Available	1	5
Improving Image Restoration through Removing Degradations in Textual Representations	Dec 28, 2023	DeblurringDenoising	CodeCode Available	1	5
Multimodal Procedural Planning via Dual Text-Image Prompting	May 2, 2023	Image GenerationImage to text	CodeCode Available	1	5
ObjectCompose: Evaluating Resilience of Vision-Based Models on Object-to-Background Compositional Changes	Mar 7, 2024	Image to textObject	CodeCode Available	1	5
Efficient Medical Vision-Language Alignment Through Adapting Masked Vision Models	Jun 10, 2025	Contrastive LearningImage-text matching	CodeCode Available	1	5
See or Guess: Counterfactually Regularized Image Captioning	Aug 29, 2024	Causal Inferencecounterfactual	CodeCode Available	1	5
MAGVLT: Masked Generative Vision-and-Language Transformer	Mar 21, 2023	Image CaptioningImage Generation	CodeCode Available	1	5
Write and Paint: Generative Vision-Language Models are Unified Modal Learners	Jun 15, 2022	Image GenerationImage to text	CodeCode Available	1	5
DoraCycle: Domain-Oriented Adaptation of Unified Generative Model in Multimodal Cycles	Mar 5, 2025	Domain AdaptationImage to text	CodeCode Available	1	5
Progressive Transformer-Based Generation of Radiology Reports	Feb 19, 2021	Image to textText Generation	CodeCode Available	1	5
Brain Captioning: Decoding human brain activity into images and text	May 19, 2023	Brain DecodingDepth Estimation	CodeCode Available	1	5
Distilled Dual-Encoder Model for Vision-Language Understanding	Dec 16, 2021	Image to textmodel	CodeCode Available	1	5
Safe Latent Diffusion: Mitigating Inappropriate Degeneration in Diffusion Models	Nov 9, 2022	Image GenerationImage to text	CodeCode Available	1	5
FETA: Towards Specializing Foundation Models for Expert Task Applications	Sep 8, 2022	Domain GeneralizationFew-Shot Learning	CodeCode Available	1	5
Can MLLMs Perform Text-to-Image In-Context Learning?	Feb 2, 2024	Image GenerationImage to text	CodeCode Available	1	5
ERNIE-ViLG: Unified Generative Pre-training for Bidirectional Vision-Language Generation	Dec 31, 2021	Image CaptioningImage Generation	CodeCode Available	1	5
Unifying Multimodal Transformer for Bi-directional Image and Text Generation	Oct 19, 2021	Image GenerationImage to text	CodeCode Available	1	5
FLAME: Frozen Large Language Models Enable Data-Efficient Language-Image Pre-training	Nov 18, 2024	Data AugmentationImage to text	CodeCode Available	1	5
Generalizing from SIMPLE to HARD Visual Reasoning: Can We Mitigate Modality Imbalance in VLMs?	Jan 5, 2025	Image CaptioningImage to text	CodeCode Available	1	5
PromptHash:Affinity-Prompted Collaborative Cross-Modal Learning for Adaptive Hashing Retrieval	Jan 1, 2025	Contrastive LearningImage Retrieval	CodeCode Available	0	5
PromptHash: Affinity-Prompted Collaborative Cross-Modal Learning for Adaptive Hashing Retrieval	Mar 20, 2025	Contrastive LearningCross-Modal Retrieval	CodeCode Available	0	5
Reading the unreadable: Creating a dataset of 19th century English newspapers using image-to-text language models	Feb 18, 2025	Image to textOptical Character Recognition	CodeCode Available	0	5
BiVLC: Extending Vision-Language Compositionality Evaluation with Text-to-Image Retrieval	Jun 14, 2024	Image RetrievalImage to text	CodeCode Available	0	5
Pragmatic Radiology Report Generation	Nov 28, 2023	Image to text	CodeCode Available	0	5
Probing Multimodal Large Language Models for Global and Local Semantic Representations	Feb 27, 2024	Image to textobject-detection	CodeCode Available	0	5

Show:10 25 50

← PrevPage 3 of 10Next →

No leaderboard results yet.