SOTAVerified

Image to text

Papers

Showing 1120 of 246 papers

TitleStatusHype
Contrasting Deepfakes Diffusion via Contrastive Learning and Global-Local SimilaritiesCode2
In Defense of Lazy Visual Grounding for Open-Vocabulary Semantic SegmentationCode2
Fine-grained Late-interaction Multi-modal Retrieval for Retrieval Augmented Visual Question AnsweringCode2
Pix2Struct: Screenshot Parsing as Pretraining for Visual Language UnderstandingCode2
LDRE: LLM-based Divergent Reasoning and Ensemble for Zero-Shot Composed Image RetrievalCode2
CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept MatchingCode2
From Pixels to Graphs: Open-Vocabulary Scene Graph Generation with Vision-Language ModelsCode2
Generative Diffusion Models on Graphs: Methods and ApplicationsCode2
GIT: A Generative Image-to-text Transformer for Vision and LanguageCode2
Libra: Building Decoupled Vision System on Large Language ModelsCode2
Show:102550
← PrevPage 2 of 25Next →

No leaderboard results yet.