Image to text

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–50 of 246 papers

Title	Date	Tasks	Status	Hype
Improving Medical Visual Representation Learning with Pathological-level Cross-Modal Alignment and Correlation Exploration	Jun 12, 2025	cross-modal alignmentImage to text	—Unverified	0
ChartReasoner: Code-Driven Modality Bridging for Long-Chain Reasoning in Chart Question Answering	Jun 11, 2025	Chart Question AnsweringImage to text	—Unverified	0
Efficient Medical Vision-Language Alignment Through Adapting Masked Vision Models	Jun 10, 2025	Contrastive LearningImage-text matching	CodeCode Available	1
TNG-CLIP:Training-Time Negation Data Generation for Negation Awareness of CLIP	May 24, 2025	Image CaptioningImage Generation	—Unverified	0
BRIT: Bidirectional Retrieval over Unified Image-Text Graph	May 24, 2025	Image to textQuestion Answering	—Unverified	0
Robustifying Vision-Language Models via Dynamic Token Reweighting	May 22, 2025	Image to text	—Unverified	0
UniMoCo: Unified Modality Completion for Robust Multi-Modal Embeddings	May 17, 2025	Image to textInformation Retrieval	CodeCode Available	0
Towards Cross-modal Retrieval in Chinese Cultural Heritage Documents: Dataset and Solution	May 16, 2025	Cross-Modal RetrievalImage to text	—Unverified	0
X-Fusion: Introducing New Modality to Frozen Large Language Models	Apr 29, 2025	Image to text	—Unverified	0
SemCORE: A Semantic-Enhanced Generative Cross-Modal Retrieval Framework with MLLMs	Apr 17, 2025	Cross-Modal RetrievalImage Retrieval	—Unverified	0
DART: Disease-aware Image-Text Alignment and Self-correcting Re-alignment for Trustworthy Radiology Report Generation	Apr 16, 2025	Contrastive LearningImage to text	—Unverified	0
TMCIR: Token Merge Benefits Composed Image Retrieval	Apr 15, 2025	Contrastive Learningcross-modal alignment	—Unverified	0
LMM4LMM: Benchmarking and Evaluating Large-multimodal Image Generation with LMMs	Apr 11, 2025	BenchmarkingImage Generation	CodeCode Available	1
LRSCLIP: A Vision-Language Foundation Model for Aligning Remote Sensing Image with Longer Text	Mar 25, 2025	Cross-Modal RetrievalHallucination	CodeCode Available	1
Image-to-Text for Medical Reports Using Adaptive Co-Attention and Triple-LSTM Module	Mar 24, 2025	Image to textMedical Report Generation	—Unverified	0
Natural Language Generation	Mar 20, 2025	Image CaptioningImage to text	—Unverified	0
PromptHash: Affinity-Prompted Collaborative Cross-Modal Learning for Adaptive Hashing Retrieval	Mar 20, 2025	Contrastive LearningCross-Modal Retrieval	CodeCode Available	0
Real-world validation of a multimodal LLM-powered pipeline for High-Accuracy Clinical Trial Patient Matching leveraging EHR data	Mar 19, 2025	Image to text	CodeCode Available	0
MFP-CLIP: Exploring the Efficacy of Multi-Form Prompts for Zero-Shot Industrial Anomaly Detection	Mar 17, 2025	Anomaly DetectionForm	—Unverified	0
FlowTok: Flowing Seamlessly Across Text and Image Tokens	Mar 13, 2025	DenoisingImage to text	CodeCode Available	5
DoraCycle: Domain-Oriented Adaptation of Unified Generative Model in Multimodal Cycles	Mar 5, 2025	Domain AdaptationImage to text	CodeCode Available	1
ABC: Achieving Better Control of Multimodal Embeddings using VLMs	Mar 1, 2025	Image to textImage-to-Text Retrieval	—Unverified	0
On the Importance of Text Preprocessing for Multimodal Representation Learning and Pathology Report Generation	Feb 26, 2025	Cross-Modal RetrievalHallucination	—Unverified	0
Reading the unreadable: Creating a dataset of 19th century English newspapers using image-to-text language models	Feb 18, 2025	Image to textOptical Character Recognition	CodeCode Available	0
Natural Language Generation from Visual Sequences: Challenges and Future Directions	Feb 18, 2025	Image to textText Generation	—Unverified	0
Magma: A Foundation Model for Multimodal AI Agents	Feb 18, 2025	Autonomous Web NavigationImage to text	CodeCode Available	5
UNITE-FND: Reframing Multimodal Fake News Detection through Unimodal Scene Translation	Feb 16, 2025	Binary ClassificationFake News Detection	—Unverified	0
UniCMs: A Unified Consistency Model For Efficient Multimodal Generation and Understanding	Feb 8, 2025	DenoisingImage Generation	CodeCode Available	1
Multi-LLM Collaborative Caption Generation in Scientific Documents	Jan 5, 2025	Caption GenerationImage to text	CodeCode Available	0
Generalizing from SIMPLE to HARD Visual Reasoning: Can We Mitigate Modality Imbalance in VLMs?	Jan 5, 2025	Image CaptioningImage to text	CodeCode Available	1
Advancing Myopia To Holism: Fully Contrastive Language-Image Pre-training	Jan 1, 2025	Image-text RetrievalImage to text	—Unverified	0
Retaining Knowledge and Enhancing Long-Text Representations in CLIP through Dual-Teacher Distillation	Jan 1, 2025	image-classificationImage Classification	—Unverified	0
PromptHash:Affinity-Prompted Collaborative Cross-Modal Learning for Adaptive Hashing Retrieval	Jan 1, 2025	Contrastive LearningImage Retrieval	CodeCode Available	0
Survey on Abstractive Text Summarization: Dataset, Models, and Metrics	Dec 22, 2024	Abstractive Text SummarizationGeneral Knowledge	CodeCode Available	0
CLIP-FSAC++: Few-Shot Anomaly Classification with Anomaly Descriptor Based on CLIP	Dec 5, 2024	Anomaly ClassificationAnomaly Detection	CodeCode Available	0
DIR: Retrieval-Augmented Image Captioning with Comprehensive Understanding	Dec 2, 2024	Caption GenerationDomain Generalization	—Unverified	0
Improving Factuality of 3D Brain MRI Report Generation with Paired Image-domain Retrieval and Text-domain Augmentation	Nov 23, 2024	Cross-Modal RetrievalImage to text	—Unverified	0
FLAME: Frozen Large Language Models Enable Data-Efficient Language-Image Pre-training	Nov 18, 2024	Data AugmentationImage to text	CodeCode Available	1
Everything is a Video: Unifying Modalities through Next-Frame Prediction	Nov 15, 2024	Caption GenerationCross-Modal Retrieval	—Unverified	0
Image2Text2Image: A Novel Framework for Label-Free Evaluation of Image-to-Text Generation with Text-to-Image Diffusion Models	Nov 8, 2024	Image CaptioningImage Generation	—Unverified	0
From Pixels to Prose: Advancing Multi-Modal Language Models for Remote Sensing	Nov 5, 2024	Change DetectionContrastive Learning	—Unverified	0
Robotic State Recognition with Image-to-Text Retrieval Task of Pre-Trained Vision-Language Model and Black-Box Optimization	Oct 30, 2024	Image to textImage-to-Text Retrieval	—Unverified	0
Semantic Editing Increment Benefits Zero-Shot Composed Image Retrieval	Oct 28, 2024	Image RetrievalImage to text	CodeCode Available	2
Revealing and Reducing Gender Biases in Vision and Language Assistants (VLAs)	Oct 25, 2024	AttributeImage to text	CodeCode Available	0
Beyond Color and Lines: Zero-Shot Style-Specific Image Variations with Coordinated Semantics	Oct 24, 2024	Image to textImage-Variation	—Unverified	0
Synergistic Dual Spatial-aware Generation of Image-to-Text and Text-to-Image	Oct 20, 2024	Image to text	—Unverified	0
An Online Learning Approach to Prompt-based Selection of Generative Models	Oct 17, 2024	Image to text	—Unverified	0
Patch is Enough: Naturalistic Adversarial Patch against Vision-Language Pre-training Models	Oct 7, 2024	Image to text	—Unverified	0
Backdooring Vision-Language Models with Out-Of-Distribution Data	Oct 2, 2024	Image CaptioningImage to text	—Unverified	0
See then Tell: Enhancing Key Information Extraction with Vision Grounding	Sep 29, 2024	Image to textKey Information Extraction	—Unverified	0

Show:10 25 50

← PrevPage 1 of 5Next →

No leaderboard results yet.