Image to text

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 151–200 of 246 papers

Title	Date	Tasks	Status	Hype
DiffusionSTR: Diffusion Model for Scene Text Recognition	Jun 29, 2023	Image to textmodel	—Unverified	0
I See Dead People: Gray-Box Adversarial Attack on Image-To-Text Models	Jun 13, 2023	Adversarial AttackDecoder	—Unverified	0
CapText: Large Language Model-based Caption Generation From Image Context and Description	Jun 1, 2023	Caption GenerationImage to text	—Unverified	0
Brain Captioning: Decoding human brain activity into images and text	May 19, 2023	Brain DecodingDepth Estimation	CodeCode Available	1
What You See is What You Read? Improving Text-Image Alignment Evaluation	May 17, 2023	Image GenerationImage to text	CodeCode Available	1
Category-Oriented Representation Learning for Image to Multi-Modal Retrieval	May 6, 2023	Cross-Modal RetrievalImage Retrieval	—Unverified	0
Image Captioners Sometimes Tell More Than Images They See	May 4, 2023	DescriptiveImage Captioning	—Unverified	0
Multimodal Procedural Planning via Dual Text-Image Prompting	May 2, 2023	Image GenerationImage to text	CodeCode Available	1
Interpreting Vision and Language Generative Models with Semantic Visual Priors	Apr 28, 2023	Image to text	—Unverified	0
RoCOCO: Robustness Benchmark of MS-COCO to Stress-test Image-Text Matching Models	Apr 21, 2023	Cross-Modal RetrievalImage-text matching	CodeCode Available	0
Is Cross-modal Information Retrieval Possible without Training?	Apr 20, 2023	Contrastive LearningCross-Modal Information Retrieval	—Unverified	0
Task-Oriented Multi-Modal Mutual Leaning for Vision-Language Models	Mar 30, 2023	Image to textPrompt Learning	—Unverified	0
CoBIT: A Contrastive Bi-directional Image-Text Generation Model	Mar 23, 2023	DecoderImage Generation	—Unverified	0
MAGVLT: Masked Generative Vision-and-Language Transformer	Mar 21, 2023	Image CaptioningImage Generation	CodeCode Available	1
Improving Table Structure Recognition with Visual-Alignment Sequential Coordinate Modeling	Mar 13, 2023	DecoderImage to text	—Unverified	0
One Transformer Fits All Distributions in Multi-Modal Diffusion at Scale	Mar 12, 2023	AllImage Generation	CodeCode Available	3
ZeroNLG: Aligning and Autoencoding Domains for Zero-Shot Multimodal and Multilingual Natural Language Generation	Mar 11, 2023	Image CaptioningImage to text	CodeCode Available	1
An End-to-End Neural Network for Image-to-Audio Transformation	Mar 10, 2023	Image to texttext-to-speech	—Unverified	0
Towards Unifying Medical Vision-and-Language Pre-training via Soft Prompts	Feb 17, 2023	Image RetrievalImage-text Classification	CodeCode Available	1
VITR: Augmenting Vision Transformers with Relation-Focused Learning for Cross-Modal Information Retrieval	Feb 13, 2023	Cross-Modal Information RetrievalCross-Modal Retrieval	—Unverified	0
Re-ViLM: Retrieval-Augmented Visual Language Model for Zero and Few-Shot Image Captioning	Feb 9, 2023	Few-Shot LearningImage Captioning	—Unverified	0
Generative Diffusion Models on Graphs: Methods and Applications	Feb 6, 2023	DenoisingGraph Generation	CodeCode Available	2
Language Quantized AutoEncoders: Towards Unsupervised Text-Image Alignment	Feb 2, 2023	AttributeFew-Shot Image Classification	CodeCode Available	1
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models	Jan 30, 2023	Generative Visual Question AnsweringImage Captioning	CodeCode Available	4
Adaptively Clustering Neighbor Elements for Image-Text Generation	Jan 5, 2023	ClusteringDecoder	CodeCode Available	0
SLAN: Self-Locator Aided Network for Vision-Language Understanding	Jan 1, 2023	Image RetrievalImage to text	—Unverified	0
Do DALL-E and Flamingo Understand Each Other?	Dec 23, 2022	Image CaptioningImage Generation	—Unverified	0
When are Lemons Purple? The Concept Association Bias of Vision-Language Models	Dec 22, 2022	Attributeimage-classification	—Unverified	0
MatCha: Enhancing Visual Language Pretraining with Math Reasoning and Chart Derendering	Dec 19, 2022	Chart Question AnsweringData Summarization	—Unverified	0
SLAN: Self-Locator Aided Network for Cross-Modal Understanding	Nov 28, 2022	Image RetrievalImage to text	—Unverified	0
Retrieval-Augmented Multimodal Language Modeling	Nov 22, 2022	Caption GenerationImage Captioning	—Unverified	0
Versatile Diffusion: Text, Images and Variations All in One Diffusion Model	Nov 15, 2022	AllDisentanglement	CodeCode Available	6
Safe Latent Diffusion: Mitigating Inappropriate Degeneration in Diffusion Models	Nov 9, 2022	Image GenerationImage to text	CodeCode Available	1
Learning by Hallucinating: Vision-Language Pre-training with Weak Supervision	Oct 24, 2022	cross-modal alignmentCross-Modal Retrieval	—Unverified	0
Improving the Factual Correctness of Radiology Report Generation with Semantic Rewards	Oct 21, 2022	Image to textnamed-entity-recognition	—Unverified	0
Visual Spatial Description: Controlled Spatial-Oriented Image-to-Text Generation	Oct 20, 2022	DecoderImage Captioning	CodeCode Available	1
Image Semantic Relation Generation	Oct 19, 2022	Image RetrievalImage Segmentation	—Unverified	0
Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding	Oct 7, 2022	Chart Question AnsweringDiversity	CodeCode Available	2
Cross-modal Contrastive Attention Model for Medical Report Generation	Oct 1, 2022	Image to textMedical Report Generation	—Unverified	0
Linearly Mapping from Image to Text Space	Sep 30, 2022	Image CaptioningImage to text	CodeCode Available	1
FETA: Towards Specializing Foundation Models for Expert Task Applications	Sep 8, 2022	Domain GeneralizationFew-Shot Learning	CodeCode Available	1
Every picture tells a story: Image-grounded controllable stylistic story generation	Sep 4, 2022	Image CaptioningImage to text	—Unverified	0
Discovering Bugs in Vision Models using Off-the-shelf Image Generation and Captioning	Aug 18, 2022	Image GenerationImage to text	—Unverified	0
Paired Cross-Modal Data Augmentation for Fine-Grained Image-to-Text Retrieval	Jul 29, 2022	Cross-Modal RetrievalData Augmentation	—Unverified	0
SRCB at SemEval-2022 Task 5: Pretraining Based Image to Text Late Sequential Fusion System for Multimodal Misogynous Meme Identification	Jul 1, 2022	Image to text	—Unverified	0
What is Where by Looking: Weakly-Supervised Open-World Phrase-Grounding without Text Inputs	Jun 19, 2022	BenchmarkingImage Captioning	CodeCode Available	1
Write and Paint: Generative Vision-Language Models are Unified Modal Learners	Jun 15, 2022	Image GenerationImage to text	CodeCode Available	1
Delving into the Openness of CLIP	Jun 4, 2022	image-classificationImage Classification	CodeCode Available	0
Multilingual Image Corpus – Towards a Multimodal and Multilingual Dataset	Jun 1, 2022	Caption Generationimage-classification	—Unverified	0
GIT: A Generative Image-to-text Transformer for Vision and Language	May 27, 2022	DecoderImage Captioning	CodeCode Available	2

Show:10 25 50

← PrevPage 4 of 5Next →

No leaderboard results yet.