SOTAVerified

Image to text

Papers

Showing 2130 of 246 papers

TitleStatusHype
GIT: A Generative Image-to-text Transformer for Vision and LanguageCode2
Efficient Medical Vision-Language Alignment Through Adapting Masked Vision ModelsCode1
LMM4LMM: Benchmarking and Evaluating Large-multimodal Image Generation with LMMsCode1
LRSCLIP: A Vision-Language Foundation Model for Aligning Remote Sensing Image with Longer TextCode1
DoraCycle: Domain-Oriented Adaptation of Unified Generative Model in Multimodal CyclesCode1
UniCMs: A Unified Consistency Model For Efficient Multimodal Generation and UnderstandingCode1
Generalizing from SIMPLE to HARD Visual Reasoning: Can We Mitigate Modality Imbalance in VLMs?Code1
FLAME: Frozen Large Language Models Enable Data-Efficient Language-Image Pre-trainingCode1
See or Guess: Counterfactually Regularized Image CaptioningCode1
UniFashion: A Unified Vision-Language Model for Multimodal Fashion Retrieval and GenerationCode1
Show:102550
← PrevPage 3 of 25Next →

No leaderboard results yet.