Image to text

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–25 of 246 papers

Title	Date	Tasks	Status	Hype
Improving Medical Visual Representation Learning with Pathological-level Cross-Modal Alignment and Correlation Exploration	Jun 12, 2025	cross-modal alignmentImage to text	—Unverified	0
ChartReasoner: Code-Driven Modality Bridging for Long-Chain Reasoning in Chart Question Answering	Jun 11, 2025	Chart Question AnsweringImage to text	—Unverified	0
Efficient Medical Vision-Language Alignment Through Adapting Masked Vision Models	Jun 10, 2025	Contrastive LearningImage-text matching	CodeCode Available	1
TNG-CLIP:Training-Time Negation Data Generation for Negation Awareness of CLIP	May 24, 2025	Image CaptioningImage Generation	—Unverified	0
BRIT: Bidirectional Retrieval over Unified Image-Text Graph	May 24, 2025	Image to textQuestion Answering	—Unverified	0
Robustifying Vision-Language Models via Dynamic Token Reweighting	May 22, 2025	Image to text	—Unverified	0
UniMoCo: Unified Modality Completion for Robust Multi-Modal Embeddings	May 17, 2025	Image to textInformation Retrieval	CodeCode Available	0
Towards Cross-modal Retrieval in Chinese Cultural Heritage Documents: Dataset and Solution	May 16, 2025	Cross-Modal RetrievalImage to text	—Unverified	0
X-Fusion: Introducing New Modality to Frozen Large Language Models	Apr 29, 2025	Image to text	—Unverified	0
SemCORE: A Semantic-Enhanced Generative Cross-Modal Retrieval Framework with MLLMs	Apr 17, 2025	Cross-Modal RetrievalImage Retrieval	—Unverified	0
DART: Disease-aware Image-Text Alignment and Self-correcting Re-alignment for Trustworthy Radiology Report Generation	Apr 16, 2025	Contrastive LearningImage to text	—Unverified	0
TMCIR: Token Merge Benefits Composed Image Retrieval	Apr 15, 2025	Contrastive Learningcross-modal alignment	—Unverified	0
LMM4LMM: Benchmarking and Evaluating Large-multimodal Image Generation with LMMs	Apr 11, 2025	BenchmarkingImage Generation	CodeCode Available	1
LRSCLIP: A Vision-Language Foundation Model for Aligning Remote Sensing Image with Longer Text	Mar 25, 2025	Cross-Modal RetrievalHallucination	CodeCode Available	1
Image-to-Text for Medical Reports Using Adaptive Co-Attention and Triple-LSTM Module	Mar 24, 2025	Image to textMedical Report Generation	—Unverified	0
Natural Language Generation	Mar 20, 2025	Image CaptioningImage to text	—Unverified	0
PromptHash: Affinity-Prompted Collaborative Cross-Modal Learning for Adaptive Hashing Retrieval	Mar 20, 2025	Contrastive LearningCross-Modal Retrieval	CodeCode Available	0
Real-world validation of a multimodal LLM-powered pipeline for High-Accuracy Clinical Trial Patient Matching leveraging EHR data	Mar 19, 2025	Image to text	CodeCode Available	0
MFP-CLIP: Exploring the Efficacy of Multi-Form Prompts for Zero-Shot Industrial Anomaly Detection	Mar 17, 2025	Anomaly DetectionForm	—Unverified	0
FlowTok: Flowing Seamlessly Across Text and Image Tokens	Mar 13, 2025	DenoisingImage to text	CodeCode Available	5
DoraCycle: Domain-Oriented Adaptation of Unified Generative Model in Multimodal Cycles	Mar 5, 2025	Domain AdaptationImage to text	CodeCode Available	1
ABC: Achieving Better Control of Multimodal Embeddings using VLMs	Mar 1, 2025	Image to textImage-to-Text Retrieval	—Unverified	0
On the Importance of Text Preprocessing for Multimodal Representation Learning and Pathology Report Generation	Feb 26, 2025	Cross-Modal RetrievalHallucination	—Unverified	0
Reading the unreadable: Creating a dataset of 19th century English newspapers using image-to-text language models	Feb 18, 2025	Image to textOptical Character Recognition	CodeCode Available	0
Natural Language Generation from Visual Sequences: Challenges and Future Directions	Feb 18, 2025	Image to textText Generation	—Unverified	0

Show:10 25 50

← PrevPage 1 of 10Next →

No leaderboard results yet.