Image to text

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 101–125 of 246 papers

Title	Date	Tasks	Status	Hype
Captions Are Worth a Thousand Words: Enhancing Product Retrieval with Pretrained Image-to-Text Models	Feb 13, 2024	Image CaptioningImage to text	—Unverified	0
Can MLLMs Perform Text-to-Image In-Context Learning?	Feb 2, 2024	Image GenerationImage to text	CodeCode Available	1
Dynamic Traceback Learning for Medical Report Generation	Jan 24, 2024	Image to textMedical Report Generation	—Unverified	0
Benchmarking Large Multimodal Models against Common Corruptions	Jan 22, 2024	BenchmarkingImage to text	CodeCode Available	1
CoCoT: Contrastive Chain-of-Thought Prompting for Large Multimodal Models with Multiple Image Inputs	Jan 5, 2024	Image ComprehensionImage to text	CodeCode Available	0
SyCoCa: Symmetrizing Contrastive Captioners with Attentive Masking for Multimodal Alignment	Jan 4, 2024	Image Captioningimage-classification	—Unverified	0
Accept the Modality Gap: An Exploration in the Hyperbolic Space	Jan 1, 2024	Image to textImage-to-Text Retrieval	—Unverified	0
Improving Image Restoration through Removing Degradations in Textual Representations	Dec 28, 2023	DeblurringDenoising	CodeCode Available	1
RefineNet: Enhancing Text-to-Image Conversion with High-Resolution and Detail Accuracy through Hierarchical Transformers and Progressive Refinement	Dec 27, 2023	Computational EfficiencyImage Generation	—Unverified	0
DiffuVST: Narrating Fictional Scenes with Global-History-Guided Denoising Models	Dec 12, 2023	DenoisingDiversity	—Unverified	0
Learning Pseudo-Labeler beyond Noun Concepts for Open-Vocabulary Object Detection	Dec 4, 2023	Image to textobject-detection	—Unverified	0
Cross-Modal Adaptive Dual Association for Text-to-Image Person Retrieval	Dec 4, 2023	AttributeCross-Modal Person Re-Identification	—Unverified	0
Pragmatic Radiology Report Generation	Nov 28, 2023	Image to text	CodeCode Available	0
Safe-CLIP: Removing NSFW Concepts from Vision-and-Language Models	Nov 27, 2023	Cross-Modal RetrievalImage Generation	CodeCode Available	1
Beyond Images: An Integrative Multi-modal Approach to Chest X-Ray Report Generation	Nov 18, 2023	Image to textSemantic Similarity	—Unverified	0
Efficient End-to-End Visual Document Understanding with Rationale Distillation	Nov 16, 2023	document understandingImage to text	—Unverified	0
AI Recommendation System for Enhanced Customer Experience: A Novel Image-to-Text Method	Nov 16, 2023	Image to textObject	—Unverified	0
Semantically Grounded QFormer for Efficient Vision Language Understanding	Nov 13, 2023	DiversityImage to text	—Unverified	0
GPT-4V(ision) as a Generalist Evaluator for Vision-Language Tasks	Nov 2, 2023	Image GenerationImage to text	—Unverified	0
UrbanCLIP: Learning Text-enhanced Urban Region Profiling with Contrastive Language-Image Pretraining from the Web	Oct 22, 2023	Image to textLanguage Modeling	CodeCode Available	1
Ziya-Visual: Bilingual Large Vision-Language Model via Multi-Task Instruction Tuning	Oct 12, 2023	Image CaptioningImage-text Retrieval	—Unverified	0
SingleInsert: Inserting New Concepts from a Single Image into Text-to-Image Models for Flexible Editing	Oct 12, 2023	Image GenerationImage to text	—Unverified	0
Symmetrical Linguistic Feature Distillation with CLIP for Scene Text Recognition	Oct 8, 2023	Image to textOptical Character Recognition (OCR)	CodeCode Available	1
Reinforced UI Instruction Grounding: Towards a Generic UI Task Automation API	Oct 7, 2023	Decoderdocument understanding	—Unverified	0
Leveraging Unpaired Data for Vision-Language Generative Models via Cycle Consistency	Oct 5, 2023	Image GenerationImage to text	—Unverified	0

Show:10 25 50

← PrevPage 5 of 10Next →

No leaderboard results yet.