SOTAVerified|Agents Browse Leaderboard About Blog

Image to text

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 11–20 of 246 papers

Title	Date	Tasks	Status	Hype	Score
Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning	Sep 5, 2023	DecoderImage Generation	CodeCode Available	2	5
LDRE: LLM-based Divergent Reasoning and Ensemble for Zero-Shot Composed Image Retrieval	Jul 11, 2024	Image RetrievalImage to text	CodeCode Available	2	5
Fine-grained Late-interaction Multi-modal Retrieval for Retrieval Augmented Visual Question Answering	Sep 29, 2023	Image to textPassage Retrieval	CodeCode Available	2	5
Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding	Oct 7, 2022	Chart Question AnsweringDiversity	CodeCode Available	2	5
In Defense of Lazy Visual Grounding for Open-Vocabulary Semantic Segmentation	Aug 9, 2024	Image to textObject	CodeCode Available	2	5
Libra: Building Decoupled Vision System on Large Language Models	May 16, 2024	Image to textLanguage Modeling	CodeCode Available	2	5
Generative Diffusion Models on Graphs: Methods and Applications	Feb 6, 2023	DenoisingGraph Generation	CodeCode Available	2	5
Contrasting Deepfakes Diffusion via Contrastive Learning and Global-Local Similarities	Jul 29, 2024	Contrastive LearningDeepFake Detection	CodeCode Available	2	5
From Pixels to Graphs: Open-Vocabulary Scene Graph Generation with Vision-Language Models	Apr 1, 2024	Graph GenerationImage to text	CodeCode Available	2	5
GIT: A Generative Image-to-text Transformer for Vision and Language	May 27, 2022	DecoderImage Captioning	CodeCode Available	2	5

Show:10 25 50

← PrevPage 2 of 25Next →

No leaderboard results yet.