SOTAVerified

Image to text

Papers

Showing 2130 of 246 papers

TitleStatusHype
DoraCycle: Domain-Oriented Adaptation of Unified Generative Model in Multimodal CyclesCode1
ABC: Achieving Better Control of Multimodal Embeddings using VLMs0
On the Importance of Text Preprocessing for Multimodal Representation Learning and Pathology Report Generation0
Reading the unreadable: Creating a dataset of 19th century English newspapers using image-to-text language modelsCode0
Natural Language Generation from Visual Sequences: Challenges and Future Directions0
Magma: A Foundation Model for Multimodal AI AgentsCode5
UNITE-FND: Reframing Multimodal Fake News Detection through Unimodal Scene Translation0
UniCMs: A Unified Consistency Model For Efficient Multimodal Generation and UnderstandingCode1
Multi-LLM Collaborative Caption Generation in Scientific DocumentsCode0
Generalizing from SIMPLE to HARD Visual Reasoning: Can We Mitigate Modality Imbalance in VLMs?Code1
Show:102550
← PrevPage 3 of 25Next →

No leaderboard results yet.