Image to text

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 101–150 of 246 papers

Title	Date	Tasks	Status	Score
GABInsight: Exploring Gender-Activity Binding Bias in Vision-Language Models	Jul 30, 2024	Image to textImage-to-Text Retrieval	CodeCode Available	5
Adaptively Clustering Neighbor Elements for Image-Text Generation	Jan 5, 2023	ClusteringDecoder	CodeCode Available	5
Align before Search: Aligning Ads Image to Text for Accurate Cross-Modal Sponsored Search	Sep 28, 2023	cross-modal alignmentCross-Modal Retrieval	CodeCode Available	5
Survey on Abstractive Text Summarization: Dataset, Models, and Metrics	Dec 22, 2024	Abstractive Text SummarizationGeneral Knowledge	CodeCode Available	5
Multi-LLM Collaborative Caption Generation in Scientific Documents	Jan 5, 2025	Caption GenerationImage to text	CodeCode Available	5
UniMoCo: Unified Modality Completion for Robust Multi-Modal Embeddings	May 17, 2025	Image to textInformation Retrieval	CodeCode Available	5
BiVLC: Extending Vision-Language Compositionality Evaluation with Text-to-Image Retrieval	Jun 14, 2024	Image RetrievalImage to text	CodeCode Available	5
Robotic State Recognition with Image-to-Text Retrieval Task of Pre-Trained Vision-Language Model and Black-Box Optimization	Oct 30, 2024	Image to textImage-to-Text Retrieval	—Unverified	0
Robustifying Vision-Language Models via Dynamic Token Reweighting	May 22, 2025	Image to text	—Unverified	0
See then Tell: Enhancing Key Information Extraction with Vision Grounding	Sep 29, 2024	Image to textKey Information Extraction	—Unverified	0
SemCORE: A Semantic-Enhanced Generative Cross-Modal Retrieval Framework with MLLMs	Apr 17, 2025	Cross-Modal RetrievalImage Retrieval	—Unverified	0
Sequential Semantic Generative Communication for Progressive Text-to-Image Generation	Sep 8, 2023	Image GenerationImage to text	—Unverified	0
SingleInsert: Inserting New Concepts from a Single Image into Text-to-Image Models for Flexible Editing	Oct 12, 2023	Image GenerationImage to text	—Unverified	0
SLAN: Self-Locator Aided Network for Cross-Modal Understanding	Nov 28, 2022	Image RetrievalImage to text	—Unverified	0
SLAN: Self-Locator Aided Network for Vision-Language Understanding	Jan 1, 2023	Image RetrievalImage to text	—Unverified	0
SRCB at SemEval-2022 Task 5: Pretraining Based Image to Text Late Sequential Fusion System for Multimodal Misogynous Meme Identification	Jul 1, 2022	Image to text	—Unverified	0
SurrogatePrompt: Bypassing the Safety Filter of Text-to-Image Models via Substitution	Sep 25, 2023	Image to text	—Unverified	0
Survey of Visual-Semantic Embedding Methods for Zero-Shot Image Retrieval	May 16, 2021	Graph GenerationImage Captioning	—Unverified	0
SyCoCa: Symmetrizing Contrastive Captioners with Attentive Masking for Multimodal Alignment	Jan 4, 2024	Image Captioningimage-classification	—Unverified	0
Synergistic Dual Spatial-aware Generation of Image-to-Text and Text-to-Image	Oct 20, 2024	Image to text	—Unverified	0
Synthesizing Novel Pairs of Image and Text	Dec 18, 2017	Image to text	—Unverified	0
Task-Oriented Multi-Modal Mutual Leaning for Vision-Language Models	Mar 30, 2023	Image to textPrompt Learning	—Unverified	0
TMCIR: Token Merge Benefits Composed Image Retrieval	Apr 15, 2025	Contrastive Learningcross-modal alignment	—Unverified	0
TNG-CLIP:Training-Time Negation Data Generation for Negation Awareness of CLIP	May 24, 2025	Image CaptioningImage Generation	—Unverified	0
Towards a Visual-Language Foundation Model for Computational Pathology	Jul 24, 2023	Contrastive Learningimage-classification	—Unverified	0
Transform-Retrieve-Generate: Natural Language-Centric Outside-Knowledge Visual Question Answering	Jan 1, 2022	Generative Question AnsweringImage to text	—Unverified	0
TrojVLM: Backdoor Attack Against Vision Language Models	Sep 28, 2024	Backdoor AttackImage Captioning	—Unverified	0
Turbo Learning for Captionbot and Drawingbot	May 21, 2018	Image CaptioningImage Generation	—Unverified	0
Two-stream Hierarchical Similarity Reasoning for Image-text Matching	Mar 10, 2022	Image-text matchingImage to text	—Unverified	0
Uncertainty-based Cross-Modal Retrieval with Probabilistic Representations	Apr 20, 2022	Cross-Modal RetrievalImage Retrieval	—Unverified	0
Understanding the Effect of using Semantically Meaningful Tokens for Visual Representation Learning	May 26, 2024	Image to textImage-to-Text Retrieval	—Unverified	0
UNITE-FND: Reframing Multimodal Fake News Detection through Unimodal Scene Translation	Feb 16, 2025	Binary ClassificationFake News Detection	—Unverified	0
Using Inter-Sentence Diverse Beam Search to Reduce Redundancy in Visual Storytelling	May 30, 2018	Image to textSentence	—Unverified	0
Utilizing Resource-Rich Language Datasets for End-to-End Scene Text Recognition in Resource-Poor Languages	Nov 24, 2021	DecoderImage to text	—Unverified	0
Vision-Braille: An End-to-End Tool for Chinese Braille Image-to-Text Translation	Jul 8, 2024	Image to textLifelong learning	—Unverified	0
Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation	Apr 30, 2024	Caption GenerationHallucination	—Unverified	0
When are Lemons Purple? The Concept Association Bias of Vision-Language Models	Dec 22, 2022	Attributeimage-classification	—Unverified	0
X-Fusion: Introducing New Modality to Frozen Large Language Models	Apr 29, 2025	Image to text	—Unverified	0
15M Multimodal Facial Image-Text Dataset	Jul 11, 2024	Image to text	—Unverified	0
Ziya-Visual: Bilingual Large Vision-Language Model via Multi-Task Instruction Tuning	Oct 12, 2023	Image CaptioningImage-text Retrieval	—Unverified	0
Towards Cross-modal Retrieval in Chinese Cultural Heritage Documents: Dataset and Solution	May 16, 2025	Cross-Modal RetrievalImage to text	—Unverified	0
ABC: Achieving Better Control of Multimodal Embeddings using VLMs	Mar 1, 2025	Image to textImage-to-Text Retrieval	—Unverified	0
Accept the Modality Gap: An Exploration in the Hyperbolic Space	Jan 1, 2024	Image to textImage-to-Text Retrieval	—Unverified	0
Advancing Myopia To Holism: Fully Contrastive Language-Image Pre-training	Jan 1, 2025	Image-text RetrievalImage to text	—Unverified	0
AICoderEval: Improving AI Domain Code Generation of Large Language Models	Jun 7, 2024	Code GenerationImage to text	—Unverified	0
AI Recommendation System for Enhanced Customer Experience: A Novel Image-to-Text Method	Nov 16, 2023	Image to textObject	—Unverified	0
An End-to-End Neural Network for Image-to-Audio Transformation	Mar 10, 2023	Image to texttext-to-speech	—Unverified	0
An Online Learning Approach to Prompt-based Selection of Generative Models	Oct 17, 2024	Image to text	—Unverified	0
Ask, Attend, Attack: A Effective Decision-Based Black-Box Targeted Attack for Image-to-Text Models	Aug 16, 2024	Image to text	—Unverified	0
A Thousand Words Are Worth More Than a Picture: Natural Language-Centric Outside-Knowledge Visual Question Answering	Jan 14, 2022	Generative Question AnsweringImage to text	—Unverified	0

Show:10 25 50

← PrevPage 3 of 5Next →

No leaderboard results yet.