Image to text

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 151–175 of 246 papers

Title	Date	Tasks	Status	Hype
DiffusionSTR: Diffusion Model for Scene Text Recognition	Jun 29, 2023	Image to textmodel	—Unverified	0
I See Dead People: Gray-Box Adversarial Attack on Image-To-Text Models	Jun 13, 2023	Adversarial AttackDecoder	—Unverified	0
CapText: Large Language Model-based Caption Generation From Image Context and Description	Jun 1, 2023	Caption GenerationImage to text	—Unverified	0
Brain Captioning: Decoding human brain activity into images and text	May 19, 2023	Brain DecodingDepth Estimation	CodeCode Available	1
What You See is What You Read? Improving Text-Image Alignment Evaluation	May 17, 2023	Image GenerationImage to text	CodeCode Available	1
Category-Oriented Representation Learning for Image to Multi-Modal Retrieval	May 6, 2023	Cross-Modal RetrievalImage Retrieval	—Unverified	0
Image Captioners Sometimes Tell More Than Images They See	May 4, 2023	DescriptiveImage Captioning	—Unverified	0
Multimodal Procedural Planning via Dual Text-Image Prompting	May 2, 2023	Image GenerationImage to text	CodeCode Available	1
Interpreting Vision and Language Generative Models with Semantic Visual Priors	Apr 28, 2023	Image to text	—Unverified	0
RoCOCO: Robustness Benchmark of MS-COCO to Stress-test Image-Text Matching Models	Apr 21, 2023	Cross-Modal RetrievalImage-text matching	CodeCode Available	0
Is Cross-modal Information Retrieval Possible without Training?	Apr 20, 2023	Contrastive LearningCross-Modal Information Retrieval	—Unverified	0
Task-Oriented Multi-Modal Mutual Leaning for Vision-Language Models	Mar 30, 2023	Image to textPrompt Learning	—Unverified	0
CoBIT: A Contrastive Bi-directional Image-Text Generation Model	Mar 23, 2023	DecoderImage Generation	—Unverified	0
MAGVLT: Masked Generative Vision-and-Language Transformer	Mar 21, 2023	Image CaptioningImage Generation	CodeCode Available	1
Improving Table Structure Recognition with Visual-Alignment Sequential Coordinate Modeling	Mar 13, 2023	DecoderImage to text	—Unverified	0
One Transformer Fits All Distributions in Multi-Modal Diffusion at Scale	Mar 12, 2023	AllImage Generation	CodeCode Available	3
ZeroNLG: Aligning and Autoencoding Domains for Zero-Shot Multimodal and Multilingual Natural Language Generation	Mar 11, 2023	Image CaptioningImage to text	CodeCode Available	1
An End-to-End Neural Network for Image-to-Audio Transformation	Mar 10, 2023	Image to texttext-to-speech	—Unverified	0
Towards Unifying Medical Vision-and-Language Pre-training via Soft Prompts	Feb 17, 2023	Image RetrievalImage-text Classification	CodeCode Available	1
VITR: Augmenting Vision Transformers with Relation-Focused Learning for Cross-Modal Information Retrieval	Feb 13, 2023	Cross-Modal Information RetrievalCross-Modal Retrieval	—Unverified	0
Re-ViLM: Retrieval-Augmented Visual Language Model for Zero and Few-Shot Image Captioning	Feb 9, 2023	Few-Shot LearningImage Captioning	—Unverified	0
Generative Diffusion Models on Graphs: Methods and Applications	Feb 6, 2023	DenoisingGraph Generation	CodeCode Available	2
Language Quantized AutoEncoders: Towards Unsupervised Text-Image Alignment	Feb 2, 2023	AttributeFew-Shot Image Classification	CodeCode Available	1
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models	Jan 30, 2023	Generative Visual Question AnsweringImage Captioning	CodeCode Available	4
Adaptively Clustering Neighbor Elements for Image-Text Generation	Jan 5, 2023	ClusteringDecoder	CodeCode Available	0

Show:10 25 50

← PrevPage 7 of 10Next →

No leaderboard results yet.