Image to text

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 176–200 of 246 papers

Title	Date	Tasks	Status
Robustifying Vision-Language Models via Dynamic Token Reweighting	May 22, 2025	Image to text	—Unverified
See then Tell: Enhancing Key Information Extraction with Vision Grounding	Sep 29, 2024	Image to textKey Information Extraction	—Unverified
SemCORE: A Semantic-Enhanced Generative Cross-Modal Retrieval Framework with MLLMs	Apr 17, 2025	Cross-Modal RetrievalImage Retrieval	—Unverified
Sequential Semantic Generative Communication for Progressive Text-to-Image Generation	Sep 8, 2023	Image GenerationImage to text	—Unverified
SingleInsert: Inserting New Concepts from a Single Image into Text-to-Image Models for Flexible Editing	Oct 12, 2023	Image GenerationImage to text	—Unverified
SLAN: Self-Locator Aided Network for Cross-Modal Understanding	Nov 28, 2022	Image RetrievalImage to text	—Unverified
SLAN: Self-Locator Aided Network for Vision-Language Understanding	Jan 1, 2023	Image RetrievalImage to text	—Unverified
SRCB at SemEval-2022 Task 5: Pretraining Based Image to Text Late Sequential Fusion System for Multimodal Misogynous Meme Identification	Jul 1, 2022	Image to text	—Unverified
SurrogatePrompt: Bypassing the Safety Filter of Text-to-Image Models via Substitution	Sep 25, 2023	Image to text	—Unverified
Survey of Visual-Semantic Embedding Methods for Zero-Shot Image Retrieval	May 16, 2021	Graph GenerationImage Captioning	—Unverified
SyCoCa: Symmetrizing Contrastive Captioners with Attentive Masking for Multimodal Alignment	Jan 4, 2024	Image Captioningimage-classification	—Unverified
Synergistic Dual Spatial-aware Generation of Image-to-Text and Text-to-Image	Oct 20, 2024	Image to text	—Unverified
Synthesizing Novel Pairs of Image and Text	Dec 18, 2017	Image to text	—Unverified
Task-Oriented Multi-Modal Mutual Leaning for Vision-Language Models	Mar 30, 2023	Image to textPrompt Learning	—Unverified
TMCIR: Token Merge Benefits Composed Image Retrieval	Apr 15, 2025	Contrastive Learningcross-modal alignment	—Unverified
TNG-CLIP:Training-Time Negation Data Generation for Negation Awareness of CLIP	May 24, 2025	Image CaptioningImage Generation	—Unverified
Towards a Visual-Language Foundation Model for Computational Pathology	Jul 24, 2023	Contrastive Learningimage-classification	—Unverified
Transform-Retrieve-Generate: Natural Language-Centric Outside-Knowledge Visual Question Answering	Jan 1, 2022	Generative Question AnsweringImage to text	—Unverified
TrojVLM: Backdoor Attack Against Vision Language Models	Sep 28, 2024	Backdoor AttackImage Captioning	—Unverified
Turbo Learning for Captionbot and Drawingbot	May 21, 2018	Image CaptioningImage Generation	—Unverified
Two-stream Hierarchical Similarity Reasoning for Image-text Matching	Mar 10, 2022	Image-text matchingImage to text	—Unverified
Uncertainty-based Cross-Modal Retrieval with Probabilistic Representations	Apr 20, 2022	Cross-Modal RetrievalImage Retrieval	—Unverified
Understanding the Effect of using Semantically Meaningful Tokens for Visual Representation Learning	May 26, 2024	Image to textImage-to-Text Retrieval	—Unverified
UNITE-FND: Reframing Multimodal Fake News Detection through Unimodal Scene Translation	Feb 16, 2025	Binary ClassificationFake News Detection	—Unverified
Using Inter-Sentence Diverse Beam Search to Reduce Redundancy in Visual Storytelling	May 30, 2018	Image to textSentence	—Unverified

Show:10 25 50

← PrevPage 8 of 10Next →

No leaderboard results yet.