Image to text

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 151–200 of 246 papers

Title	Date	Tasks	Status
RefineNet: Enhancing Text-to-Image Conversion with High-Resolution and Detail Accuracy through Hierarchical Transformers and Progressive Refinement	Dec 27, 2023	Computational EfficiencyImage Generation	—Unverified
DiffuVST: Narrating Fictional Scenes with Global-History-Guided Denoising Models	Dec 12, 2023	DenoisingDiversity	—Unverified
Cross-Modal Adaptive Dual Association for Text-to-Image Person Retrieval	Dec 4, 2023	AttributeCross-Modal Person Re-Identification	—Unverified
Learning Pseudo-Labeler beyond Noun Concepts for Open-Vocabulary Object Detection	Dec 4, 2023	Image to textobject-detection	—Unverified
Pragmatic Radiology Report Generation	Nov 28, 2023	Image to text	CodeCode Available
Beyond Images: An Integrative Multi-modal Approach to Chest X-Ray Report Generation	Nov 18, 2023	Image to textSemantic Similarity	—Unverified
AI Recommendation System for Enhanced Customer Experience: A Novel Image-to-Text Method	Nov 16, 2023	Image to textObject	—Unverified
Efficient End-to-End Visual Document Understanding with Rationale Distillation	Nov 16, 2023	document understandingImage to text	—Unverified
Semantically Grounded QFormer for Efficient Vision Language Understanding	Nov 13, 2023	DiversityImage to text	—Unverified
GPT-4V(ision) as a Generalist Evaluator for Vision-Language Tasks	Nov 2, 2023	Image GenerationImage to text	—Unverified
Ziya-Visual: Bilingual Large Vision-Language Model via Multi-Task Instruction Tuning	Oct 12, 2023	Image CaptioningImage-text Retrieval	—Unverified
SingleInsert: Inserting New Concepts from a Single Image into Text-to-Image Models for Flexible Editing	Oct 12, 2023	Image GenerationImage to text	—Unverified
Reinforced UI Instruction Grounding: Towards a Generic UI Task Automation API	Oct 7, 2023	Decoderdocument understanding	—Unverified
Leveraging Unpaired Data for Vision-Language Generative Models via Cycle Consistency	Oct 5, 2023	Image GenerationImage to text	—Unverified
Align before Search: Aligning Ads Image to Text for Accurate Cross-Modal Sponsored Search	Sep 28, 2023	cross-modal alignmentCross-Modal Retrieval	CodeCode Available
SurrogatePrompt: Bypassing the Safety Filter of Text-to-Image Models via Substitution	Sep 25, 2023	Image to text	—Unverified
CLIP-based Synergistic Knowledge Transfer for Text-based Person Retrieval	Sep 18, 2023	Image to textPerson Retrieval	CodeCode Available
Offline Detection of Misspelled Handwritten Words by Convolving Recognition Model Features with Text Labels	Sep 18, 2023	Generative Adversarial NetworkHandwriting Recognition	—Unverified
BiLMa: Bidirectional Local-Matching for Text-based Person Re-identification	Sep 9, 2023	Image to textLanguage Modeling	—Unverified
Sequential Semantic Generative Communication for Progressive Text-to-Image Generation	Sep 8, 2023	Image GenerationImage to text	—Unverified
GrowCLIP: Data-aware Automatic Model Growing for Large-scale Contrastive Language-Image Pre-training	Aug 22, 2023	image-classificationImage Classification	—Unverified
Multimodal Neurons in Pretrained Text-Only Transformers	Aug 3, 2023	Image CaptioningImage to text	—Unverified
Revisiting DETR Pre-training for Object Detection	Aug 2, 2023	Image to textObject	—Unverified
Towards a Visual-Language Foundation Model for Computational Pathology	Jul 24, 2023	Contrastive Learningimage-classification	—Unverified
PiTL: Cross-modal Retrieval with Weakly-supervised Vision-language Pre-training via Prompting	Jul 14, 2023	Cross-Modal RetrievalImage to text	—Unverified
MultiQG-TI: Towards Question Generation from Multi-modal Sources	Jul 7, 2023	Image to textOptical Character Recognition	CodeCode Available
Zero-shot Nuclei Detection via Visual-Language Pre-trained Models	Jun 30, 2023	Image to textobject-detection	CodeCode Available
DiffusionSTR: Diffusion Model for Scene Text Recognition	Jun 29, 2023	Image to textmodel	—Unverified
I See Dead People: Gray-Box Adversarial Attack on Image-To-Text Models	Jun 13, 2023	Adversarial AttackDecoder	—Unverified
CapText: Large Language Model-based Caption Generation From Image Context and Description	Jun 1, 2023	Caption GenerationImage to text	—Unverified
Category-Oriented Representation Learning for Image to Multi-Modal Retrieval	May 6, 2023	Cross-Modal RetrievalImage Retrieval	—Unverified
Image Captioners Sometimes Tell More Than Images They See	May 4, 2023	DescriptiveImage Captioning	—Unverified
Interpreting Vision and Language Generative Models with Semantic Visual Priors	Apr 28, 2023	Image to text	—Unverified
RoCOCO: Robustness Benchmark of MS-COCO to Stress-test Image-Text Matching Models	Apr 21, 2023	Cross-Modal RetrievalImage-text matching	CodeCode Available
Is Cross-modal Information Retrieval Possible without Training?	Apr 20, 2023	Contrastive LearningCross-Modal Information Retrieval	—Unverified
Task-Oriented Multi-Modal Mutual Leaning for Vision-Language Models	Mar 30, 2023	Image to textPrompt Learning	—Unverified
CoBIT: A Contrastive Bi-directional Image-Text Generation Model	Mar 23, 2023	DecoderImage Generation	—Unverified
Improving Table Structure Recognition with Visual-Alignment Sequential Coordinate Modeling	Mar 13, 2023	DecoderImage to text	—Unverified
An End-to-End Neural Network for Image-to-Audio Transformation	Mar 10, 2023	Image to texttext-to-speech	—Unverified
VITR: Augmenting Vision Transformers with Relation-Focused Learning for Cross-Modal Information Retrieval	Feb 13, 2023	Cross-Modal Information RetrievalCross-Modal Retrieval	—Unverified
Re-ViLM: Retrieval-Augmented Visual Language Model for Zero and Few-Shot Image Captioning	Feb 9, 2023	Few-Shot LearningImage Captioning	—Unverified
Adaptively Clustering Neighbor Elements for Image-Text Generation	Jan 5, 2023	ClusteringDecoder	CodeCode Available
SLAN: Self-Locator Aided Network for Vision-Language Understanding	Jan 1, 2023	Image RetrievalImage to text	—Unverified
Do DALL-E and Flamingo Understand Each Other?	Dec 23, 2022	Image CaptioningImage Generation	—Unverified
When are Lemons Purple? The Concept Association Bias of Vision-Language Models	Dec 22, 2022	Attributeimage-classification	—Unverified
MatCha: Enhancing Visual Language Pretraining with Math Reasoning and Chart Derendering	Dec 19, 2022	Chart Question AnsweringData Summarization	—Unverified
SLAN: Self-Locator Aided Network for Cross-Modal Understanding	Nov 28, 2022	Image RetrievalImage to text	—Unverified
Retrieval-Augmented Multimodal Language Modeling	Nov 22, 2022	Caption GenerationImage Captioning	—Unverified
Learning by Hallucinating: Vision-Language Pre-training with Weak Supervision	Oct 24, 2022	cross-modal alignmentCross-Modal Retrieval	—Unverified
Improving the Factual Correctness of Radiology Report Generation with Semantic Rewards	Oct 21, 2022	Image to textnamed-entity-recognition	—Unverified

Show:10 25 50

← PrevPage 4 of 5Next →

No leaderboard results yet.