Image to text

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 176–200 of 246 papers

Title	Date	Tasks	Status	Hype
SLAN: Self-Locator Aided Network for Vision-Language Understanding	Jan 1, 2023	Image RetrievalImage to text	—Unverified	0
Do DALL-E and Flamingo Understand Each Other?	Dec 23, 2022	Image CaptioningImage Generation	—Unverified	0
When are Lemons Purple? The Concept Association Bias of Vision-Language Models	Dec 22, 2022	Attributeimage-classification	—Unverified	0
MatCha: Enhancing Visual Language Pretraining with Math Reasoning and Chart Derendering	Dec 19, 2022	Chart Question AnsweringData Summarization	—Unverified	0
SLAN: Self-Locator Aided Network for Cross-Modal Understanding	Nov 28, 2022	Image RetrievalImage to text	—Unverified	0
Retrieval-Augmented Multimodal Language Modeling	Nov 22, 2022	Caption GenerationImage Captioning	—Unverified	0
Versatile Diffusion: Text, Images and Variations All in One Diffusion Model	Nov 15, 2022	AllDisentanglement	CodeCode Available	6
Safe Latent Diffusion: Mitigating Inappropriate Degeneration in Diffusion Models	Nov 9, 2022	Image GenerationImage to text	CodeCode Available	1
Learning by Hallucinating: Vision-Language Pre-training with Weak Supervision	Oct 24, 2022	cross-modal alignmentCross-Modal Retrieval	—Unverified	0
Improving the Factual Correctness of Radiology Report Generation with Semantic Rewards	Oct 21, 2022	Image to textnamed-entity-recognition	—Unverified	0
Visual Spatial Description: Controlled Spatial-Oriented Image-to-Text Generation	Oct 20, 2022	DecoderImage Captioning	CodeCode Available	1
Image Semantic Relation Generation	Oct 19, 2022	Image RetrievalImage Segmentation	—Unverified	0
Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding	Oct 7, 2022	Chart Question AnsweringDiversity	CodeCode Available	2
Cross-modal Contrastive Attention Model for Medical Report Generation	Oct 1, 2022	Image to textMedical Report Generation	—Unverified	0
Linearly Mapping from Image to Text Space	Sep 30, 2022	Image CaptioningImage to text	CodeCode Available	1
FETA: Towards Specializing Foundation Models for Expert Task Applications	Sep 8, 2022	Domain GeneralizationFew-Shot Learning	CodeCode Available	1
Every picture tells a story: Image-grounded controllable stylistic story generation	Sep 4, 2022	Image CaptioningImage to text	—Unverified	0
Discovering Bugs in Vision Models using Off-the-shelf Image Generation and Captioning	Aug 18, 2022	Image GenerationImage to text	—Unverified	0
Paired Cross-Modal Data Augmentation for Fine-Grained Image-to-Text Retrieval	Jul 29, 2022	Cross-Modal RetrievalData Augmentation	—Unverified	0
SRCB at SemEval-2022 Task 5: Pretraining Based Image to Text Late Sequential Fusion System for Multimodal Misogynous Meme Identification	Jul 1, 2022	Image to text	—Unverified	0
What is Where by Looking: Weakly-Supervised Open-World Phrase-Grounding without Text Inputs	Jun 19, 2022	BenchmarkingImage Captioning	CodeCode Available	1
Write and Paint: Generative Vision-Language Models are Unified Modal Learners	Jun 15, 2022	Image GenerationImage to text	CodeCode Available	1
Delving into the Openness of CLIP	Jun 4, 2022	image-classificationImage Classification	CodeCode Available	0
Multilingual Image Corpus – Towards a Multimodal and Multilingual Dataset	Jun 1, 2022	Caption Generationimage-classification	—Unverified	0
GIT: A Generative Image-to-text Transformer for Vision and Language	May 27, 2022	DecoderImage Captioning	CodeCode Available	2

Show:10 25 50

← PrevPage 8 of 10Next →

No leaderboard results yet.