Image to text

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 101–125 of 246 papers

Title	Date	Tasks	Status	Score
Zero-shot Nuclei Detection via Visual-Language Pre-trained Models	Jun 30, 2023	Image to textobject-detection	CodeCode Available	5
GABInsight: Exploring Gender-Activity Binding Bias in Vision-Language Models	Jul 30, 2024	Image to textImage-to-Text Retrieval	CodeCode Available	5
Align before Search: Aligning Ads Image to Text for Accurate Cross-Modal Sponsored Search	Sep 28, 2023	cross-modal alignmentCross-Modal Retrieval	CodeCode Available	5
Face2Text: Collecting an Annotated Image Description Corpus for the Generation of Rich Face Descriptions	Mar 10, 2018	Image DescriptionImage to text	CodeCode Available	5
Reminding Multimodal Large Language Models of Object-aware Knowledge with Retrieved Tags	Jun 16, 2024	Image to textInstruction Following	—Unverified	0
Retaining Knowledge and Enhancing Long-Text Representations in CLIP through Dual-Teacher Distillation	Jan 1, 2025	image-classificationImage Classification	—Unverified	0
Retrieval-Augmented Multimodal Language Modeling	Nov 22, 2022	Caption GenerationImage Captioning	—Unverified	0
Re-ViLM: Retrieval-Augmented Visual Language Model for Zero and Few-Shot Image Captioning	Feb 9, 2023	Few-Shot LearningImage Captioning	—Unverified	0
Revisiting DETR Pre-training for Object Detection	Aug 2, 2023	Image to textObject	—Unverified	0
Robotic Environmental State Recognition with Pre-Trained Vision-Language Models and Black-Box Optimization	Sep 26, 2024	Image to textImage-to-Text Retrieval	—Unverified	0
Robotic State Recognition with Image-to-Text Retrieval Task of Pre-Trained Vision-Language Model and Black-Box Optimization	Oct 30, 2024	Image to textImage-to-Text Retrieval	—Unverified	0
Robustifying Vision-Language Models via Dynamic Token Reweighting	May 22, 2025	Image to text	—Unverified	0
See then Tell: Enhancing Key Information Extraction with Vision Grounding	Sep 29, 2024	Image to textKey Information Extraction	—Unverified	0
SemCORE: A Semantic-Enhanced Generative Cross-Modal Retrieval Framework with MLLMs	Apr 17, 2025	Cross-Modal RetrievalImage Retrieval	—Unverified	0
Sequential Semantic Generative Communication for Progressive Text-to-Image Generation	Sep 8, 2023	Image GenerationImage to text	—Unverified	0
SingleInsert: Inserting New Concepts from a Single Image into Text-to-Image Models for Flexible Editing	Oct 12, 2023	Image GenerationImage to text	—Unverified	0
SLAN: Self-Locator Aided Network for Cross-Modal Understanding	Nov 28, 2022	Image RetrievalImage to text	—Unverified	0
SLAN: Self-Locator Aided Network for Vision-Language Understanding	Jan 1, 2023	Image RetrievalImage to text	—Unverified	0
SRCB at SemEval-2022 Task 5: Pretraining Based Image to Text Late Sequential Fusion System for Multimodal Misogynous Meme Identification	Jul 1, 2022	Image to text	—Unverified	0
SurrogatePrompt: Bypassing the Safety Filter of Text-to-Image Models via Substitution	Sep 25, 2023	Image to text	—Unverified	0
Survey of Visual-Semantic Embedding Methods for Zero-Shot Image Retrieval	May 16, 2021	Graph GenerationImage Captioning	—Unverified	0
SyCoCa: Symmetrizing Contrastive Captioners with Attentive Masking for Multimodal Alignment	Jan 4, 2024	Image Captioningimage-classification	—Unverified	0
Synergistic Dual Spatial-aware Generation of Image-to-Text and Text-to-Image	Oct 20, 2024	Image to text	—Unverified	0
Synthesizing Novel Pairs of Image and Text	Dec 18, 2017	Image to text	—Unverified	0
Task-Oriented Multi-Modal Mutual Leaning for Vision-Language Models	Mar 30, 2023	Image to textPrompt Learning	—Unverified	0

Show:10 25 50

← PrevPage 5 of 10Next →

No leaderboard results yet.