Image to text

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 26–50 of 246 papers

Title	Date	Tasks	Status	Hype	Score
Language Quantized AutoEncoders: Towards Unsupervised Text-Image Alignment	Feb 2, 2023	AttributeFew-Shot Image Classification	CodeCode Available	1	5
LMM4LMM: Benchmarking and Evaluating Large-multimodal Image Generation with LMMs	Apr 11, 2025	BenchmarkingImage Generation	CodeCode Available	1	5
Beyond One-to-One: Rethinking the Referring Image Segmentation	Aug 26, 2023	DecoderImage Segmentation	CodeCode Available	1	5
Concadia: Towards Image-Based Text Generation with a Purpose	Apr 16, 2021	Image CaptioningImage to text	CodeCode Available	1	5
Brain Captioning: Decoding human brain activity into images and text	May 19, 2023	Brain DecodingDepth Estimation	CodeCode Available	1	5
LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-Text Generation?	Apr 16, 2024	Image CaptioningImage Generation	CodeCode Available	1	5
PRIOR: Prototype Representation Joint Learning from Medical Images and Reports	Jul 24, 2023	Contrastive LearningImage to text	CodeCode Available	1	5
CMC-Bench: Towards a New Paradigm of Visual Signal Compression	Jun 13, 2024	Image CompressionImage to text	CodeCode Available	1	5
ObjectCompose: Evaluating Resilience of Vision-Based Models on Object-to-Background Compositional Changes	Mar 7, 2024	Image to textObject	CodeCode Available	1	5
Multimodal Foundation Models For Echocardiogram Interpretation	Aug 29, 2023	Cross-Modal RetrievalDiagnostic	CodeCode Available	1	5
Benchmarking Large Multimodal Models against Common Corruptions	Jan 22, 2024	BenchmarkingImage to text	CodeCode Available	1	5
MAGVLT: Masked Generative Vision-and-Language Transformer	Mar 21, 2023	Image CaptioningImage Generation	CodeCode Available	1	5
Multimodal Procedural Planning via Dual Text-Image Prompting	May 2, 2023	Image GenerationImage to text	CodeCode Available	1	5
FETA: Towards Specializing Foundation Models for Expert Task Applications	Sep 8, 2022	Domain GeneralizationFew-Shot Learning	CodeCode Available	1	5
Improving Factual Completeness and Consistency of Image-to-Text Radiology Report Generation	Oct 20, 2020	Image to textNatural Language Inference	CodeCode Available	1	5
LRSCLIP: A Vision-Language Foundation Model for Aligning Remote Sensing Image with Longer Text	Mar 25, 2025	Cross-Modal RetrievalHallucination	CodeCode Available	1	5
Vision-Language Dataset Distillation	Aug 15, 2023	Dataset Distillationimage-classification	CodeCode Available	1	5
Cephalo: Multi-Modal Vision-Language Models for Bio-Inspired Materials Analysis and Design	May 29, 2024	Dataset GenerationImage to text	CodeCode Available	1	5
Generalizing from SIMPLE to HARD Visual Reasoning: Can We Mitigate Modality Imbalance in VLMs?	Jan 5, 2025	Image CaptioningImage to text	CodeCode Available	1	5
FLAME: Frozen Large Language Models Enable Data-Efficient Language-Image Pre-training	Nov 18, 2024	Data AugmentationImage to text	CodeCode Available	1	5
L-Verse: Bidirectional Generation Between Image and Text	Nov 22, 2021	Image CaptioningImage Generation	CodeCode Available	1	5
Efficient Medical Vision-Language Alignment Through Adapting Masked Vision Models	Jun 10, 2025	Contrastive LearningImage-text matching	CodeCode Available	1	5
DoraCycle: Domain-Oriented Adaptation of Unified Generative Model in Multimodal Cycles	Mar 5, 2025	Domain AdaptationImage to text	CodeCode Available	1	5
Can MLLMs Perform Text-to-Image In-Context Learning?	Feb 2, 2024	Image GenerationImage to text	CodeCode Available	1	5
Distilled Dual-Encoder Model for Vision-Language Understanding	Dec 16, 2021	Image to textmodel	CodeCode Available	1	5

Show:10 25 50

← PrevPage 2 of 10Next →

No leaderboard results yet.