Image to text

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 101–150 of 246 papers

Title	Date	Tasks	Status	Hype
Captions Are Worth a Thousand Words: Enhancing Product Retrieval with Pretrained Image-to-Text Models	Feb 13, 2024	Image CaptioningImage to text	—Unverified	0
Can MLLMs Perform Text-to-Image In-Context Learning?	Feb 2, 2024	Image GenerationImage to text	CodeCode Available	1
Dynamic Traceback Learning for Medical Report Generation	Jan 24, 2024	Image to textMedical Report Generation	—Unverified	0
Benchmarking Large Multimodal Models against Common Corruptions	Jan 22, 2024	BenchmarkingImage to text	CodeCode Available	1
CoCoT: Contrastive Chain-of-Thought Prompting for Large Multimodal Models with Multiple Image Inputs	Jan 5, 2024	Image ComprehensionImage to text	—Unverified	0
SyCoCa: Symmetrizing Contrastive Captioners with Attentive Masking for Multimodal Alignment	Jan 4, 2024	Image Captioningimage-classification	—Unverified	0
Accept the Modality Gap: An Exploration in the Hyperbolic Space	Jan 1, 2024	Image to textImage-to-Text Retrieval	—Unverified	0
Improving Image Restoration through Removing Degradations in Textual Representations	Dec 28, 2023	DeblurringDenoising	CodeCode Available	1
RefineNet: Enhancing Text-to-Image Conversion with High-Resolution and Detail Accuracy through Hierarchical Transformers and Progressive Refinement	Dec 27, 2023	Computational EfficiencyImage Generation	—Unverified	0
DiffuVST: Narrating Fictional Scenes with Global-History-Guided Denoising Models	Dec 12, 2023	DenoisingDiversity	—Unverified	0
Learning Pseudo-Labeler beyond Noun Concepts for Open-Vocabulary Object Detection	Dec 4, 2023	Image to textobject-detection	—Unverified	0
Cross-Modal Adaptive Dual Association for Text-to-Image Person Retrieval	Dec 4, 2023	AttributeCross-Modal Person Re-Identification	—Unverified	0
Pragmatic Radiology Report Generation	Nov 28, 2023	Image to text	CodeCode Available	0
Safe-CLIP: Removing NSFW Concepts from Vision-and-Language Models	Nov 27, 2023	Cross-Modal RetrievalImage Generation	CodeCode Available	1
Beyond Images: An Integrative Multi-modal Approach to Chest X-Ray Report Generation	Nov 18, 2023	Image to textSemantic Similarity	—Unverified	0
Efficient End-to-End Visual Document Understanding with Rationale Distillation	Nov 16, 2023	document understandingImage to text	—Unverified	0
AI Recommendation System for Enhanced Customer Experience: A Novel Image-to-Text Method	Nov 16, 2023	Image to textObject	—Unverified	0
Semantically Grounded QFormer for Efficient Vision Language Understanding	Nov 13, 2023	DiversityImage to text	—Unverified	0
GPT-4V(ision) as a Generalist Evaluator for Vision-Language Tasks	Nov 2, 2023	Image GenerationImage to text	—Unverified	0
UrbanCLIP: Learning Text-enhanced Urban Region Profiling with Contrastive Language-Image Pretraining from the Web	Oct 22, 2023	Image to textLanguage Modeling	CodeCode Available	1
Ziya-Visual: Bilingual Large Vision-Language Model via Multi-Task Instruction Tuning	Oct 12, 2023	Image CaptioningImage-text Retrieval	—Unverified	0
SingleInsert: Inserting New Concepts from a Single Image into Text-to-Image Models for Flexible Editing	Oct 12, 2023	Image GenerationImage to text	—Unverified	0
Symmetrical Linguistic Feature Distillation with CLIP for Scene Text Recognition	Oct 8, 2023	Image to textOptical Character Recognition (OCR)	CodeCode Available	1
Reinforced UI Instruction Grounding: Towards a Generic UI Task Automation API	Oct 7, 2023	Decoderdocument understanding	—Unverified	0
Leveraging Unpaired Data for Vision-Language Generative Models via Cycle Consistency	Oct 5, 2023	Image GenerationImage to text	—Unverified	0
Fine-grained Late-interaction Multi-modal Retrieval for Retrieval Augmented Visual Question Answering	Sep 29, 2023	Image to textPassage Retrieval	CodeCode Available	2
Align before Search: Aligning Ads Image to Text for Accurate Cross-Modal Sponsored Search	Sep 28, 2023	cross-modal alignmentCross-Modal Retrieval	CodeCode Available	0
SurrogatePrompt: Bypassing the Safety Filter of Text-to-Image Models via Substitution	Sep 25, 2023	Image to text	—Unverified	0
Offline Detection of Misspelled Handwritten Words by Convolving Recognition Model Features with Text Labels	Sep 18, 2023	Generative Adversarial NetworkHandwriting Recognition	—Unverified	0
CLIP-based Synergistic Knowledge Transfer for Text-based Person Retrieval	Sep 18, 2023	Image to textPerson Retrieval	CodeCode Available	0
BiLMa: Bidirectional Local-Matching for Text-based Person Re-identification	Sep 9, 2023	Image to textLanguage Modeling	—Unverified	0
Sequential Semantic Generative Communication for Progressive Text-to-Image Generation	Sep 8, 2023	Image GenerationImage to text	—Unverified	0
Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning	Sep 5, 2023	DecoderImage Generation	CodeCode Available	2
Multimodal Foundation Models For Echocardiogram Interpretation	Aug 29, 2023	Cross-Modal RetrievalDiagnostic	CodeCode Available	1
Beyond One-to-One: Rethinking the Referring Image Segmentation	Aug 26, 2023	DecoderImage Segmentation	CodeCode Available	1
Large Multilingual Models Pivot Zero-Shot Multimodal Learning across Languages	Aug 23, 2023	Image GenerationImage to text	CodeCode Available	6
GrowCLIP: Data-aware Automatic Model Growing for Large-scale Contrastive Language-Image Pre-training	Aug 22, 2023	image-classificationImage Classification	—Unverified	0
Vision-Language Dataset Distillation	Aug 15, 2023	Dataset Distillationimage-classification	CodeCode Available	1
Unifying Two-Stream Encoders with Transformers for Cross-Modal Retrieval	Aug 8, 2023	Cross-Modal RetrievalImage Retrieval	CodeCode Available	1
Multimodal Neurons in Pretrained Text-Only Transformers	Aug 3, 2023	Image CaptioningImage to text	—Unverified	0
Revisiting DETR Pre-training for Object Detection	Aug 2, 2023	Image to textObject	—Unverified	0
Transferable Decoding with Visual Entities for Zero-Shot Image Captioning	Jul 31, 2023	Caption GenerationHallucination	CodeCode Available	1
PRIOR: Prototype Representation Joint Learning from Medical Images and Reports	Jul 24, 2023	Contrastive LearningImage to text	CodeCode Available	1
Towards a Visual-Language Foundation Model for Computational Pathology	Jul 24, 2023	Contrastive Learningimage-classification	—Unverified	0
Planting a SEED of Vision in Large Language Model	Jul 16, 2023	Image GenerationImage to text	CodeCode Available	2
PiTL: Cross-modal Retrieval with Weakly-supervised Vision-language Pre-training via Prompting	Jul 14, 2023	Cross-Modal RetrievalImage to text	—Unverified	0
Bootstrapping Vision-Language Learning with Decoupled Language Pre-training	Jul 13, 2023	Image to text	CodeCode Available	1
Emu: Generative Pretraining in Multimodality	Jul 11, 2023	Image CaptioningImage Generation	CodeCode Available	3
MultiQG-TI: Towards Question Generation from Multi-modal Sources	Jul 7, 2023	Image to textOptical Character Recognition	CodeCode Available	0
Zero-shot Nuclei Detection via Visual-Language Pre-trained Models	Jun 30, 2023	Image to textobject-detection	CodeCode Available	0

Show:10 25 50

← PrevPage 3 of 5Next →

No leaderboard results yet.