SOTAVerified

Image to text

Papers

Showing 151200 of 246 papers

TitleStatusHype
RefineNet: Enhancing Text-to-Image Conversion with High-Resolution and Detail Accuracy through Hierarchical Transformers and Progressive Refinement0
DiffuVST: Narrating Fictional Scenes with Global-History-Guided Denoising Models0
Cross-Modal Adaptive Dual Association for Text-to-Image Person Retrieval0
Learning Pseudo-Labeler beyond Noun Concepts for Open-Vocabulary Object Detection0
Pragmatic Radiology Report GenerationCode0
Beyond Images: An Integrative Multi-modal Approach to Chest X-Ray Report Generation0
AI Recommendation System for Enhanced Customer Experience: A Novel Image-to-Text Method0
Efficient End-to-End Visual Document Understanding with Rationale Distillation0
Semantically Grounded QFormer for Efficient Vision Language Understanding0
GPT-4V(ision) as a Generalist Evaluator for Vision-Language Tasks0
Ziya-Visual: Bilingual Large Vision-Language Model via Multi-Task Instruction Tuning0
SingleInsert: Inserting New Concepts from a Single Image into Text-to-Image Models for Flexible Editing0
Reinforced UI Instruction Grounding: Towards a Generic UI Task Automation API0
Leveraging Unpaired Data for Vision-Language Generative Models via Cycle Consistency0
Align before Search: Aligning Ads Image to Text for Accurate Cross-Modal Sponsored SearchCode0
SurrogatePrompt: Bypassing the Safety Filter of Text-to-Image Models via Substitution0
CLIP-based Synergistic Knowledge Transfer for Text-based Person RetrievalCode0
Offline Detection of Misspelled Handwritten Words by Convolving Recognition Model Features with Text Labels0
BiLMa: Bidirectional Local-Matching for Text-based Person Re-identification0
Sequential Semantic Generative Communication for Progressive Text-to-Image Generation0
GrowCLIP: Data-aware Automatic Model Growing for Large-scale Contrastive Language-Image Pre-training0
Multimodal Neurons in Pretrained Text-Only Transformers0
Revisiting DETR Pre-training for Object Detection0
Towards a Visual-Language Foundation Model for Computational Pathology0
PiTL: Cross-modal Retrieval with Weakly-supervised Vision-language Pre-training via Prompting0
MultiQG-TI: Towards Question Generation from Multi-modal SourcesCode0
Zero-shot Nuclei Detection via Visual-Language Pre-trained ModelsCode0
DiffusionSTR: Diffusion Model for Scene Text Recognition0
I See Dead People: Gray-Box Adversarial Attack on Image-To-Text Models0
CapText: Large Language Model-based Caption Generation From Image Context and Description0
Category-Oriented Representation Learning for Image to Multi-Modal Retrieval0
Image Captioners Sometimes Tell More Than Images They See0
Interpreting Vision and Language Generative Models with Semantic Visual Priors0
RoCOCO: Robustness Benchmark of MS-COCO to Stress-test Image-Text Matching ModelsCode0
Is Cross-modal Information Retrieval Possible without Training?0
Task-Oriented Multi-Modal Mutual Leaning for Vision-Language Models0
CoBIT: A Contrastive Bi-directional Image-Text Generation Model0
Improving Table Structure Recognition with Visual-Alignment Sequential Coordinate Modeling0
An End-to-End Neural Network for Image-to-Audio Transformation0
VITR: Augmenting Vision Transformers with Relation-Focused Learning for Cross-Modal Information Retrieval0
Re-ViLM: Retrieval-Augmented Visual Language Model for Zero and Few-Shot Image Captioning0
Adaptively Clustering Neighbor Elements for Image-Text GenerationCode0
SLAN: Self-Locator Aided Network for Vision-Language Understanding0
Do DALL-E and Flamingo Understand Each Other?0
When are Lemons Purple? The Concept Association Bias of Vision-Language Models0
MatCha: Enhancing Visual Language Pretraining with Math Reasoning and Chart DerenderingCode0
SLAN: Self-Locator Aided Network for Cross-Modal Understanding0
Retrieval-Augmented Multimodal Language Modeling0
Learning by Hallucinating: Vision-Language Pre-training with Weak Supervision0
Improving the Factual Correctness of Radiology Report Generation with Semantic RewardsCode0
Show:102550
← PrevPage 4 of 5Next →

No leaderboard results yet.