Image to text

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 51–100 of 246 papers

Title	Date	Tasks	Status	Hype
Progressive Transformer-Based Generation of Radiology Reports	Feb 19, 2021	Image to textText Generation	CodeCode Available	1
ObjectCompose: Evaluating Resilience of Vision-Based Models on Object-to-Background Compositional Changes	Mar 7, 2024	Image to textObject	CodeCode Available	1
FLAME: Frozen Large Language Models Enable Data-Efficient Language-Image Pre-training	Nov 18, 2024	Data AugmentationImage to text	CodeCode Available	1
Efficient Medical Vision-Language Alignment Through Adapting Masked Vision Models	Jun 10, 2025	Contrastive LearningImage-text matching	CodeCode Available	1
Multimodal Foundation Models For Echocardiogram Interpretation	Aug 29, 2023	Cross-Modal RetrievalDiagnostic	CodeCode Available	1
Safe-CLIP: Removing NSFW Concepts from Vision-and-Language Models	Nov 27, 2023	Cross-Modal RetrievalImage Generation	CodeCode Available	1
L-Verse: Bidirectional Generation Between Image and Text	Nov 22, 2021	Image CaptioningImage Generation	CodeCode Available	1
DoraCycle: Domain-Oriented Adaptation of Unified Generative Model in Multimodal Cycles	Mar 5, 2025	Domain AdaptationImage to text	CodeCode Available	1
Bootstrapping Vision-Language Learning with Decoupled Language Pre-training	Jul 13, 2023	Image to text	CodeCode Available	1
Linearly Mapping from Image to Text Space	Sep 30, 2022	Image CaptioningImage to text	CodeCode Available	1
Brain Captioning: Decoding human brain activity into images and text	May 19, 2023	Brain DecodingDepth Estimation	CodeCode Available	1
Distilled Dual-Encoder Model for Vision-Language Understanding	Dec 16, 2021	Image to textmodel	CodeCode Available	1
Language Quantized AutoEncoders: Towards Unsupervised Text-Image Alignment	Feb 2, 2023	AttributeFew-Shot Image Classification	CodeCode Available	1
Can MLLMs Perform Text-to-Image In-Context Learning?	Feb 2, 2024	Image GenerationImage to text	CodeCode Available	1
LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-Text Generation?	Apr 16, 2024	Image CaptioningImage Generation	CodeCode Available	1
Language-Oriented Semantic Latent Representation for Image Transmission	May 16, 2024	Image to textSemantic Communication	CodeCode Available	1
MAGVLT: Masked Generative Vision-and-Language Transformer	Mar 21, 2023	Image CaptioningImage Generation	CodeCode Available	1
Safe Latent Diffusion: Mitigating Inappropriate Degeneration in Diffusion Models	Nov 9, 2022	Image GenerationImage to text	CodeCode Available	1
LMM4LMM: Benchmarking and Evaluating Large-multimodal Image Generation with LMMs	Apr 11, 2025	BenchmarkingImage Generation	CodeCode Available	1
Discovering Bugs in Vision Models using Off-the-shelf Image Generation and Captioning	Aug 18, 2022	Image GenerationImage to text	—Unverified	0
DIR: Retrieval-Augmented Image Captioning with Comprehensive Understanding	Dec 2, 2024	Caption GenerationDomain Generalization	—Unverified	0
DiffuVST: Narrating Fictional Scenes with Global-History-Guided Denoising Models	Dec 12, 2023	DenoisingDiversity	—Unverified	0
Ask, Attend, Attack: A Effective Decision-Based Black-Box Targeted Attack for Image-to-Text Models	Aug 16, 2024	Image to text	—Unverified	0
DiffusionSTR: Diffusion Model for Scene Text Recognition	Jun 29, 2023	Image to textmodel	—Unverified	0
Development of a New Image-to-text Conversion System for Pashto, Farsi and Traditional Chinese	May 8, 2020	Image to textOptical Character Recognition (OCR)	—Unverified	0
Deductron -- A Recurrent Neural Network	Jun 23, 2018	Image to textOptical Character Recognition (OCR)	—Unverified	0
DART: Disease-aware Image-Text Alignment and Self-correcting Re-alignment for Trustworthy Radiology Report Generation	Apr 16, 2025	Contrastive LearningImage to text	—Unverified	0
An Online Learning Approach to Prompt-based Selection of Generative Models	Oct 17, 2024	Image to text	—Unverified	0
Advancing Myopia To Holism: Fully Contrastive Language-Image Pre-training	Jan 1, 2025	Image-text RetrievalImage to text	—Unverified	0
Improving Factuality of 3D Brain MRI Report Generation with Paired Image-domain Retrieval and Text-domain Augmentation	Nov 23, 2024	Cross-Modal RetrievalImage to text	—Unverified	0
Dallah: A Dialect-Aware Multimodal Large Language Model for Arabic	Jul 25, 2024	Image to textLanguage Modeling	—Unverified	0
Cross-modal Contrastive Attention Model for Medical Report Generation	Oct 1, 2022	Image to textMedical Report Generation	—Unverified	0
BIMCV-R: A Landmark Dataset for 3D CT Text-Image Retrieval	Mar 24, 2024	DiagnosticImage Retrieval	—Unverified	0
Cross-Modal Alignment with Mixture Experts Neural Network for Intral-City Retail Recommendation	Sep 17, 2020	cross-modal alignmentImage to text	—Unverified	0
Cross-Modal Adaptive Dual Association for Text-to-Image Person Retrieval	Dec 4, 2023	AttributeCross-Modal Person Re-Identification	—Unverified	0
BiLMa: Bidirectional Local-Matching for Text-based Person Re-identification	Sep 9, 2023	Image to textLanguage Modeling	—Unverified	0
An End-to-End Neural Network for Image-to-Audio Transformation	Mar 10, 2023	Image to texttext-to-speech	—Unverified	0
COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross-Modal Retrieval	Apr 15, 2022	Contrastive LearningCross-Modal Retrieval	—Unverified	0
GrowCLIP: Data-aware Automatic Model Growing for Large-scale Contrastive Language-Image Pre-training	Aug 22, 2023	image-classificationImage Classification	—Unverified	0
Contrastive Learning of Visual-Semantic Embeddings	Oct 17, 2021	Contrastive Learningimage-classification	—Unverified	0
GPT-4V(ision) as a Generalist Evaluator for Vision-Language Tasks	Nov 2, 2023	Image GenerationImage to text	—Unverified	0
GPC: Generative and General Pathology Image Classifier	Jul 12, 2024	Classificationimage-classification	—Unverified	0
Beyond Images: An Integrative Multi-modal Approach to Chest X-Ray Report Generation	Nov 18, 2023	Image to textSemantic Similarity	—Unverified	0
ABC: Achieving Better Control of Multimodal Embeddings using VLMs	Mar 1, 2025	Image to textImage-to-Text Retrieval	—Unverified	0
CoCoT: Contrastive Chain-of-Thought Prompting for Large Multimodal Models with Multiple Image Inputs	Jan 5, 2024	Image ComprehensionImage to text	—Unverified	0
From Pixels to Prose: Advancing Multi-Modal Language Models for Remote Sensing	Nov 5, 2024	Change DetectionContrastive Learning	—Unverified	0
Beyond Color and Lines: Zero-Shot Style-Specific Image Variations with Coordinated Semantics	Oct 24, 2024	Image to textImage-Variation	—Unverified	0
Towards Cross-modal Retrieval in Chinese Cultural Heritage Documents: Dataset and Solution	May 16, 2025	Cross-Modal RetrievalImage to text	—Unverified	0
From Image to Text in Sentiment Analysis via Regression and Deep Learning	Sep 1, 2019	Image to textregression	—Unverified	0
CoBIT: A Contrastive Bi-directional Image-Text Generation Model	Mar 23, 2023	DecoderImage Generation	—Unverified	0

Show:10 25 50

← PrevPage 2 of 5Next →

No leaderboard results yet.