SOTAVerified|Agents Browse Leaderboard About

Image to text

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–10 of 246 papers

Title	Date	Tasks	Status	Hype
Large Multilingual Models Pivot Zero-Shot Multimodal Learning across Languages	Aug 23, 2023	Image GenerationImage to text	CodeCode Available	6
Versatile Diffusion: Text, Images and Variations All in One Diffusion Model	Nov 15, 2022	AllDisentanglement	CodeCode Available	6
FlowTok: Flowing Seamlessly Across Text and Image Tokens	Mar 13, 2025	DenoisingImage to text	CodeCode Available	5
Magma: A Foundation Model for Multimodal AI Agents	Feb 18, 2025	Autonomous Web NavigationImage to text	CodeCode Available	5
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models	Jan 30, 2023	Generative Visual Question AnsweringImage Captioning	CodeCode Available	4
Evaluating Text-to-Visual Generation with Image-to-Text Generation	Apr 1, 2024	Image to textQuestion Answering	CodeCode Available	3
Emu: Generative Pretraining in Multimodality	Jul 11, 2023	Image CaptioningImage Generation	CodeCode Available	3
One Transformer Fits All Distributions in Multi-Modal Diffusion at Scale	Mar 12, 2023	AllImage Generation	CodeCode Available	3
Semantic Editing Increment Benefits Zero-Shot Composed Image Retrieval	Oct 28, 2024	Image RetrievalImage to text	CodeCode Available	2
In Defense of Lazy Visual Grounding for Open-Vocabulary Semantic Segmentation	Aug 9, 2024	Image to textObject	CodeCode Available	2

Show:10 25 50

← PrevPage 1 of 25Next →

No leaderboard results yet.