Image to text

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 201–246 of 246 papers

Title	Date	Tasks	Status
Utilizing Resource-Rich Language Datasets for End-to-End Scene Text Recognition in Resource-Poor Languages	Nov 24, 2021	DecoderImage to text	—Unverified
Vision-Braille: An End-to-End Tool for Chinese Braille Image-to-Text Translation	Jul 8, 2024	Image to textLifelong learning	—Unverified
Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation	Apr 30, 2024	Caption GenerationHallucination	—Unverified
When are Lemons Purple? The Concept Association Bias of Vision-Language Models	Dec 22, 2022	Attributeimage-classification	—Unverified
X-Fusion: Introducing New Modality to Frozen Large Language Models	Apr 29, 2025	Image to text	—Unverified
15M Multimodal Facial Image-Text Dataset	Jul 11, 2024	Image to text	—Unverified
RefineNet: Enhancing Text-to-Image Conversion with High-Resolution and Detail Accuracy through Hierarchical Transformers and Progressive Refinement	Dec 27, 2023	Computational EfficiencyImage Generation	—Unverified
Reinforced UI Instruction Grounding: Towards a Generic UI Task Automation API	Oct 7, 2023	Decoderdocument understanding	—Unverified
Reminding Multimodal Large Language Models of Object-aware Knowledge with Retrieved Tags	Jun 16, 2024	Image to textInstruction Following	—Unverified
Retaining Knowledge and Enhancing Long-Text Representations in CLIP through Dual-Teacher Distillation	Jan 1, 2025	image-classificationImage Classification	—Unverified
Retrieval-Augmented Multimodal Language Modeling	Nov 22, 2022	Caption GenerationImage Captioning	—Unverified
PromptHash: Affinity-Prompted Collaborative Cross-Modal Learning for Adaptive Hashing Retrieval	Mar 20, 2025	Contrastive LearningCross-Modal Retrieval	CodeCode Available
PromptHash:Affinity-Prompted Collaborative Cross-Modal Learning for Adaptive Hashing Retrieval	Jan 1, 2025	Contrastive LearningImage Retrieval	CodeCode Available
Reading the unreadable: Creating a dataset of 19th century English newspapers using image-to-text language models	Feb 18, 2025	Image to textOptical Character Recognition	CodeCode Available
Real-world validation of a multimodal LLM-powered pipeline for High-Accuracy Clinical Trial Patient Matching leveraging EHR data	Mar 19, 2025	Image to text	CodeCode Available
MirrorGAN: Learning Text-to-image Generation by Redescription	Mar 14, 2019	DiversityImage Generation	CodeCode Available
CLIP-based Synergistic Knowledge Transfer for Text-based Person Retrieval	Sep 18, 2023	Image to textPerson Retrieval	CodeCode Available
Characterizing and Understanding the Behavior of Quantized Models for Reliable Deployment	Apr 8, 2022	Image to textLanguage Modeling	CodeCode Available
Probing Multimodal Large Language Models for Global and Local Semantic Representations	Feb 27, 2024	Image to textobject-detection	CodeCode Available
UniMoCo: Unified Modality Completion for Robust Multi-Modal Embeddings	May 17, 2025	Image to textInformation Retrieval	CodeCode Available
Delving into the Openness of CLIP	Jun 4, 2022	image-classificationImage Classification	CodeCode Available
Text-to-Image-to-Text Translation using Cycle Consistent Adversarial Networks	Aug 14, 2018	Image to textSentence	CodeCode Available
Revealing and Reducing Gender Biases in Vision and Language Assistants (VLAs)	Oct 25, 2024	AttributeImage to text	CodeCode Available
Align before Search: Aligning Ads Image to Text for Accurate Cross-Modal Sponsored Search	Sep 28, 2023	cross-modal alignmentCross-Modal Retrieval	CodeCode Available
Adaptively Clustering Neighbor Elements for Image-Text Generation	Jan 5, 2023	ClusteringDecoder	CodeCode Available
Towards a text-based quantitative and explainable histopathology image analysis	Jul 10, 2024	image-classificationImage Classification	CodeCode Available
A Gentle Tutorial of Recurrent Neural Network with Error Backpropagation	Oct 8, 2016	Handwriting RecognitionImage to text	CodeCode Available
GABInsight: Exploring Gender-Activity Binding Bias in Vision-Language Models	Jul 30, 2024	Image to textImage-to-Text Retrieval	CodeCode Available
RoCOCO: Robustness Benchmark of MS-COCO to Stress-test Image-Text Matching Models	Apr 21, 2023	Cross-Modal RetrievalImage-text matching	CodeCode Available
Pragmatic Radiology Report Generation	Nov 28, 2023	Image to text	CodeCode Available
MultiQG-TI: Towards Question Generation from Multi-modal Sources	Jul 7, 2023	Image to textOptical Character Recognition	CodeCode Available
Multi-modality Regional Alignment Network for Covid X-Ray Survival Prediction and Report Generation	May 23, 2024	Image to textSentence	CodeCode Available
Face2Text: Collecting an Annotated Image Description Corpus for the Generation of Rich Face Descriptions	Mar 10, 2018	Image DescriptionImage to text	CodeCode Available
Self-Supervised Image-to-Text and Text-to-Image Synthesis	Dec 9, 2021	Image GenerationImage to text	CodeCode Available
Multi-LLM Collaborative Caption Generation in Scientific Documents	Jan 5, 2025	Caption GenerationImage to text	CodeCode Available
BiVLC: Extending Vision-Language Compositionality Evaluation with Text-to-Image Retrieval	Jun 14, 2024	Image RetrievalImage to text	CodeCode Available
Exploration into Translation-Equivariant Image Quantization	Dec 1, 2021	Image GenerationImage to text	CodeCode Available
Zero-shot Nuclei Detection via Visual-Language Pre-trained Models	Jun 30, 2023	Image to textobject-detection	CodeCode Available
VISLA Benchmark: Evaluating Embedding Sensitivity to Semantic and Lexical Alterations	Apr 25, 2024	Image to textSensitivity	CodeCode Available
A Data-Driven Guided Decoding Mechanism for Diagnostic Captioning	Jun 20, 2024	DiagnosticImage to text	CodeCode Available
SpatialVOC2K: A Multilingual Dataset of Images with Annotations and Features for Spatial Relations between Objects	Nov 1, 2018	Image to textObject	CodeCode Available
Benchmarking Vision-Language Contrastive Methods for Medical Representation Learning	Jun 11, 2024	BenchmarkingContrastive Learning	CodeCode Available
Aligning Multilingual Word Embeddings for Cross-Modal Retrieval Task	Oct 8, 2019	Cross-Modal RetrievalImage to text	CodeCode Available
Effective Use of Word Order for Text Categorization with Convolutional Neural Networks	Dec 1, 2014	General ClassificationImage to text	CodeCode Available
Survey on Abstractive Text Summarization: Dataset, Models, and Metrics	Dec 22, 2024	Abstractive Text SummarizationGeneral Knowledge	CodeCode Available
CLIP-FSAC++: Few-Shot Anomaly Classification with Anomaly Descriptor Based on CLIP	Dec 5, 2024	Anomaly ClassificationAnomaly Detection	CodeCode Available

Show:10 25 50

← PrevPage 5 of 5Next →

No leaderboard results yet.