SOTAVerified

Image to text

Papers

Showing 125 of 246 papers

TitleStatusHype
Large Multilingual Models Pivot Zero-Shot Multimodal Learning across LanguagesCode6
Versatile Diffusion: Text, Images and Variations All in One Diffusion ModelCode6
FlowTok: Flowing Seamlessly Across Text and Image TokensCode5
Magma: A Foundation Model for Multimodal AI AgentsCode5
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language ModelsCode4
Evaluating Text-to-Visual Generation with Image-to-Text GenerationCode3
Emu: Generative Pretraining in MultimodalityCode3
One Transformer Fits All Distributions in Multi-Modal Diffusion at ScaleCode3
Semantic Editing Increment Benefits Zero-Shot Composed Image RetrievalCode2
In Defense of Lazy Visual Grounding for Open-Vocabulary Semantic SegmentationCode2
Contrasting Deepfakes Diffusion via Contrastive Learning and Global-Local SimilaritiesCode2
LDRE: LLM-based Divergent Reasoning and Ensemble for Zero-Shot Composed Image RetrievalCode2
Libra: Building Decoupled Vision System on Large Language ModelsCode2
CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept MatchingCode2
From Pixels to Graphs: Open-Vocabulary Scene Graph Generation with Vision-Language ModelsCode2
Fine-grained Late-interaction Multi-modal Retrieval for Retrieval Augmented Visual Question AnsweringCode2
Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction TuningCode2
Planting a SEED of Vision in Large Language ModelCode2
Generative Diffusion Models on Graphs: Methods and ApplicationsCode2
Pix2Struct: Screenshot Parsing as Pretraining for Visual Language UnderstandingCode2
GIT: A Generative Image-to-text Transformer for Vision and LanguageCode2
Efficient Medical Vision-Language Alignment Through Adapting Masked Vision ModelsCode1
LMM4LMM: Benchmarking and Evaluating Large-multimodal Image Generation with LMMsCode1
LRSCLIP: A Vision-Language Foundation Model for Aligning Remote Sensing Image with Longer TextCode1
DoraCycle: Domain-Oriented Adaptation of Unified Generative Model in Multimodal CyclesCode1
Show:102550
← PrevPage 1 of 10Next →

No leaderboard results yet.