Image to text

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 51–100 of 246 papers

Title	Date	Tasks	Status	Hype	Score
LRSCLIP: A Vision-Language Foundation Model for Aligning Remote Sensing Image with Longer Text	Mar 25, 2025	Cross-Modal RetrievalHallucination	CodeCode Available	1	5
MAGVLT: Masked Generative Vision-and-Language Transformer	Mar 21, 2023	Image CaptioningImage Generation	CodeCode Available	1	5
Language Quantized AutoEncoders: Towards Unsupervised Text-Image Alignment	Feb 2, 2023	AttributeFew-Shot Image Classification	CodeCode Available	1	5
What is Where by Looking: Weakly-Supervised Open-World Phrase-Grounding without Text Inputs	Jun 19, 2022	BenchmarkingImage Captioning	CodeCode Available	1	5
Linearly Mapping from Image to Text Space	Sep 30, 2022	Image CaptioningImage to text	CodeCode Available	1	5
Improving Factual Completeness and Consistency of Image-to-Text Radiology Report Generation	Oct 20, 2020	Image to textNatural Language Inference	CodeCode Available	1	5
Improving Image Restoration through Removing Degradations in Textual Representations	Dec 28, 2023	DeblurringDenoising	CodeCode Available	1	5
Towards Unifying Medical Vision-and-Language Pre-training via Soft Prompts	Feb 17, 2023	Image RetrievalImage-text Classification	CodeCode Available	1	5
Bootstrapping Vision-Language Learning with Decoupled Language Pre-training	Jul 13, 2023	Image to text	CodeCode Available	1	5
UniCMs: A Unified Consistency Model For Efficient Multimodal Generation and Understanding	Feb 8, 2025	DenoisingImage Generation	CodeCode Available	1	5
Brain Captioning: Decoding human brain activity into images and text	May 19, 2023	Brain DecodingDepth Estimation	CodeCode Available	1	5
Distilled Dual-Encoder Model for Vision-Language Understanding	Dec 16, 2021	Image to textmodel	CodeCode Available	1	5
LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-Text Generation?	Apr 16, 2024	Image CaptioningImage Generation	CodeCode Available	1	5
Visual Spatial Description: Controlled Spatial-Oriented Image-to-Text Generation	Oct 20, 2022	DecoderImage Captioning	CodeCode Available	1	5
Can MLLMs Perform Text-to-Image In-Context Learning?	Feb 2, 2024	Image GenerationImage to text	CodeCode Available	1	5
What You See is What You Read? Improving Text-Image Alignment Evaluation	May 17, 2023	Image GenerationImage to text	CodeCode Available	1	5
DoraCycle: Domain-Oriented Adaptation of Unified Generative Model in Multimodal Cycles	Mar 5, 2025	Domain AdaptationImage to text	CodeCode Available	1	5
Language-Oriented Semantic Latent Representation for Image Transmission	May 16, 2024	Image to textSemantic Communication	CodeCode Available	1	5
Safe-CLIP: Removing NSFW Concepts from Vision-and-Language Models	Nov 27, 2023	Cross-Modal RetrievalImage Generation	CodeCode Available	1	5
Text-to-Image-to-Text Translation using Cycle Consistent Adversarial Networks	Aug 14, 2018	Image to textSentence	CodeCode Available	0	5
BiVLC: Extending Vision-Language Compositionality Evaluation with Text-to-Image Retrieval	Jun 14, 2024	Image RetrievalImage to text	CodeCode Available	0	5
Towards a text-based quantitative and explainable histopathology image analysis	Jul 10, 2024	image-classificationImage Classification	CodeCode Available	0	5
Exploration into Translation-Equivariant Image Quantization	Dec 1, 2021	Image GenerationImage to text	CodeCode Available	0	5
UniMoCo: Unified Modality Completion for Robust Multi-Modal Embeddings	May 17, 2025	Image to textInformation Retrieval	CodeCode Available	0	5
Self-Supervised Image-to-Text and Text-to-Image Synthesis	Dec 9, 2021	Image GenerationImage to text	CodeCode Available	0	5
SpatialVOC2K: A Multilingual Dataset of Images with Annotations and Features for Spatial Relations between Objects	Nov 1, 2018	Image to textObject	CodeCode Available	0	5
A Data-Driven Guided Decoding Mechanism for Diagnostic Captioning	Jun 20, 2024	DiagnosticImage to text	CodeCode Available	0	5
Survey on Abstractive Text Summarization: Dataset, Models, and Metrics	Dec 22, 2024	Abstractive Text SummarizationGeneral Knowledge	CodeCode Available	0	5
Revealing and Reducing Gender Biases in Vision and Language Assistants (VLAs)	Oct 25, 2024	AttributeImage to text	CodeCode Available	0	5
Delving into the Openness of CLIP	Jun 4, 2022	image-classificationImage Classification	CodeCode Available	0	5
RoCOCO: Robustness Benchmark of MS-COCO to Stress-test Image-Text Matching Models	Apr 21, 2023	Cross-Modal RetrievalImage-text matching	CodeCode Available	0	5
Aligning Multilingual Word Embeddings for Cross-Modal Retrieval Task	Oct 8, 2019	Cross-Modal RetrievalImage to text	CodeCode Available	0	5
Real-world validation of a multimodal LLM-powered pipeline for High-Accuracy Clinical Trial Patient Matching leveraging EHR data	Mar 19, 2025	Image to text	CodeCode Available	0	5
CoCoT: Contrastive Chain-of-Thought Prompting for Large Multimodal Models with Multiple Image Inputs	Jan 5, 2024	Image ComprehensionImage to text	CodeCode Available	0	5
Probing Multimodal Large Language Models for Global and Local Semantic Representations	Feb 27, 2024	Image to textobject-detection	CodeCode Available	0	5
Align before Search: Aligning Ads Image to Text for Accurate Cross-Modal Sponsored Search	Sep 28, 2023	cross-modal alignmentCross-Modal Retrieval	CodeCode Available	0	5
Pragmatic Radiology Report Generation	Nov 28, 2023	Image to text	CodeCode Available	0	5
PromptHash: Affinity-Prompted Collaborative Cross-Modal Learning for Adaptive Hashing Retrieval	Mar 20, 2025	Contrastive LearningCross-Modal Retrieval	CodeCode Available	0	5
Adaptively Clustering Neighbor Elements for Image-Text Generation	Jan 5, 2023	ClusteringDecoder	CodeCode Available	0	5
Benchmarking Vision-Language Contrastive Methods for Medical Representation Learning	Jun 11, 2024	BenchmarkingContrastive Learning	CodeCode Available	0	5
CLIP-FSAC++: Few-Shot Anomaly Classification with Anomaly Descriptor Based on CLIP	Dec 5, 2024	Anomaly ClassificationAnomaly Detection	CodeCode Available	0	5
GABInsight: Exploring Gender-Activity Binding Bias in Vision-Language Models	Jul 30, 2024	Image to textImage-to-Text Retrieval	CodeCode Available	0	5
Face2Text: Collecting an Annotated Image Description Corpus for the Generation of Rich Face Descriptions	Mar 10, 2018	Image DescriptionImage to text	CodeCode Available	0	5
MultiQG-TI: Towards Question Generation from Multi-modal Sources	Jul 7, 2023	Image to textOptical Character Recognition	CodeCode Available	0	5
CLIP-based Synergistic Knowledge Transfer for Text-based Person Retrieval	Sep 18, 2023	Image to textPerson Retrieval	CodeCode Available	0	5
PromptHash:Affinity-Prompted Collaborative Cross-Modal Learning for Adaptive Hashing Retrieval	Jan 1, 2025	Contrastive LearningImage Retrieval	CodeCode Available	0	5
MirrorGAN: Learning Text-to-image Generation by Redescription	Mar 14, 2019	DiversityImage Generation	CodeCode Available	0	5
Multi-LLM Collaborative Caption Generation in Scientific Documents	Jan 5, 2025	Caption GenerationImage to text	CodeCode Available	0	5
Characterizing and Understanding the Behavior of Quantized Models for Reliable Deployment	Apr 8, 2022	Image to textLanguage Modeling	CodeCode Available	0	5
MatCha: Enhancing Visual Language Pretraining with Math Reasoning and Chart Derendering	Dec 19, 2022	Chart Question AnsweringData Summarization	CodeCode Available	0	5

Show:10 25 50

← PrevPage 2 of 5Next →

No leaderboard results yet.