Image to text

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 76–100 of 246 papers

Title	Date	Tasks	Status
Towards Cross-modal Retrieval in Chinese Cultural Heritage Documents: Dataset and Solution	May 16, 2025	Cross-Modal RetrievalImage to text	—Unverified
X-Fusion: Introducing New Modality to Frozen Large Language Models	Apr 29, 2025	Image to text	—Unverified
SemCORE: A Semantic-Enhanced Generative Cross-Modal Retrieval Framework with MLLMs	Apr 17, 2025	Cross-Modal RetrievalImage Retrieval	—Unverified
DART: Disease-aware Image-Text Alignment and Self-correcting Re-alignment for Trustworthy Radiology Report Generation	Apr 16, 2025	Contrastive LearningImage to text	—Unverified
TMCIR: Token Merge Benefits Composed Image Retrieval	Apr 15, 2025	Contrastive Learningcross-modal alignment	—Unverified
Image-to-Text for Medical Reports Using Adaptive Co-Attention and Triple-LSTM Module	Mar 24, 2025	Image to textMedical Report Generation	—Unverified
Natural Language Generation	Mar 20, 2025	Image CaptioningImage to text	—Unverified
PromptHash: Affinity-Prompted Collaborative Cross-Modal Learning for Adaptive Hashing Retrieval	Mar 20, 2025	Contrastive LearningCross-Modal Retrieval	CodeCode Available
Real-world validation of a multimodal LLM-powered pipeline for High-Accuracy Clinical Trial Patient Matching leveraging EHR data	Mar 19, 2025	Image to text	CodeCode Available
MFP-CLIP: Exploring the Efficacy of Multi-Form Prompts for Zero-Shot Industrial Anomaly Detection	Mar 17, 2025	Anomaly DetectionForm	—Unverified
ABC: Achieving Better Control of Multimodal Embeddings using VLMs	Mar 1, 2025	Image to textImage-to-Text Retrieval	—Unverified
On the Importance of Text Preprocessing for Multimodal Representation Learning and Pathology Report Generation	Feb 26, 2025	Cross-Modal RetrievalHallucination	—Unverified
Natural Language Generation from Visual Sequences: Challenges and Future Directions	Feb 18, 2025	Image to textText Generation	—Unverified
Reading the unreadable: Creating a dataset of 19th century English newspapers using image-to-text language models	Feb 18, 2025	Image to textOptical Character Recognition	CodeCode Available
UNITE-FND: Reframing Multimodal Fake News Detection through Unimodal Scene Translation	Feb 16, 2025	Binary ClassificationFake News Detection	—Unverified
Multi-LLM Collaborative Caption Generation in Scientific Documents	Jan 5, 2025	Caption GenerationImage to text	CodeCode Available
Advancing Myopia To Holism: Fully Contrastive Language-Image Pre-training	Jan 1, 2025	Image-text RetrievalImage to text	—Unverified
Retaining Knowledge and Enhancing Long-Text Representations in CLIP through Dual-Teacher Distillation	Jan 1, 2025	image-classificationImage Classification	—Unverified
PromptHash:Affinity-Prompted Collaborative Cross-Modal Learning for Adaptive Hashing Retrieval	Jan 1, 2025	Contrastive LearningImage Retrieval	CodeCode Available
Survey on Abstractive Text Summarization: Dataset, Models, and Metrics	Dec 22, 2024	Abstractive Text SummarizationGeneral Knowledge	CodeCode Available
CLIP-FSAC++: Few-Shot Anomaly Classification with Anomaly Descriptor Based on CLIP	Dec 5, 2024	Anomaly ClassificationAnomaly Detection	CodeCode Available
DIR: Retrieval-Augmented Image Captioning with Comprehensive Understanding	Dec 2, 2024	Caption GenerationDomain Generalization	—Unverified
Improving Factuality of 3D Brain MRI Report Generation with Paired Image-domain Retrieval and Text-domain Augmentation	Nov 23, 2024	Cross-Modal RetrievalImage to text	—Unverified
Everything is a Video: Unifying Modalities through Next-Frame Prediction	Nov 15, 2024	Caption GenerationCross-Modal Retrieval	—Unverified
Image2Text2Image: A Novel Framework for Label-Free Evaluation of Image-to-Text Generation with Text-to-Image Diffusion Models	Nov 8, 2024	Image CaptioningImage Generation	—Unverified

Show:10 25 50

← PrevPage 4 of 10Next →

No leaderboard results yet.