SOTAVerified|Agents Browse Leaderboard About Blog

Image to text

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 21–30 of 246 papers

Title	Date	Tasks	Status	Hype
DoraCycle: Domain-Oriented Adaptation of Unified Generative Model in Multimodal Cycles	Mar 5, 2025	Domain AdaptationImage to text	CodeCode Available	1
ABC: Achieving Better Control of Multimodal Embeddings using VLMs	Mar 1, 2025	Image to textImage-to-Text Retrieval	—Unverified	0
On the Importance of Text Preprocessing for Multimodal Representation Learning and Pathology Report Generation	Feb 26, 2025	Cross-Modal RetrievalHallucination	—Unverified	0
Reading the unreadable: Creating a dataset of 19th century English newspapers using image-to-text language models	Feb 18, 2025	Image to textOptical Character Recognition	CodeCode Available	0
Natural Language Generation from Visual Sequences: Challenges and Future Directions	Feb 18, 2025	Image to textText Generation	—Unverified	0
Magma: A Foundation Model for Multimodal AI Agents	Feb 18, 2025	Autonomous Web NavigationImage to text	CodeCode Available	5
UNITE-FND: Reframing Multimodal Fake News Detection through Unimodal Scene Translation	Feb 16, 2025	Binary ClassificationFake News Detection	—Unverified	0
UniCMs: A Unified Consistency Model For Efficient Multimodal Generation and Understanding	Feb 8, 2025	DenoisingImage Generation	CodeCode Available	1
Multi-LLM Collaborative Caption Generation in Scientific Documents	Jan 5, 2025	Caption GenerationImage to text	CodeCode Available	0
Generalizing from SIMPLE to HARD Visual Reasoning: Can We Mitigate Modality Imbalance in VLMs?	Jan 5, 2025	Image CaptioningImage to text	CodeCode Available	1

Show:10 25 50

← PrevPage 3 of 25Next →

No leaderboard results yet.