SOTAVerified|Agents Browse Leaderboard About

Image to text

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 21–30 of 246 papers

Title	Date	Tasks	Status	Hype
GIT: A Generative Image-to-text Transformer for Vision and Language	May 27, 2022	DecoderImage Captioning	CodeCode Available	2
Efficient Medical Vision-Language Alignment Through Adapting Masked Vision Models	Jun 10, 2025	Contrastive LearningImage-text matching	CodeCode Available	1
LMM4LMM: Benchmarking and Evaluating Large-multimodal Image Generation with LMMs	Apr 11, 2025	BenchmarkingImage Generation	CodeCode Available	1
LRSCLIP: A Vision-Language Foundation Model for Aligning Remote Sensing Image with Longer Text	Mar 25, 2025	Cross-Modal RetrievalHallucination	CodeCode Available	1
DoraCycle: Domain-Oriented Adaptation of Unified Generative Model in Multimodal Cycles	Mar 5, 2025	Domain AdaptationImage to text	CodeCode Available	1
UniCMs: A Unified Consistency Model For Efficient Multimodal Generation and Understanding	Feb 8, 2025	DenoisingImage Generation	CodeCode Available	1
Generalizing from SIMPLE to HARD Visual Reasoning: Can We Mitigate Modality Imbalance in VLMs?	Jan 5, 2025	Image CaptioningImage to text	CodeCode Available	1
FLAME: Frozen Large Language Models Enable Data-Efficient Language-Image Pre-training	Nov 18, 2024	Data AugmentationImage to text	CodeCode Available	1
See or Guess: Counterfactually Regularized Image Captioning	Aug 29, 2024	Causal Inferencecounterfactual	CodeCode Available	1
UniFashion: A Unified Vision-Language Model for Multimodal Fashion Retrieval and Generation	Aug 21, 2024	Image GenerationImage Retrieval	CodeCode Available	1

Show:10 25 50

← PrevPage 3 of 25Next →

No leaderboard results yet.