SOTAVerified

Image to text

Papers

Showing 201246 of 246 papers

TitleStatusHype
Hierarchical Gumbel Attention Network for Text-based Person Search0
HyCIR: Boosting Zero-Shot Composed Image Retrieval with Synthetic Labels0
I2T2I: Learning Text to Image Synthesis with Textual Data Augmentation0
Illegible Text to Readable Text: An Image-to-Image Transformation using Conditional Sliced Wasserstein Adversarial Networks0
Image2Text2Image: A Novel Framework for Label-Free Evaluation of Image-to-Text Generation with Text-to-Image Diffusion Models0
Image Captioners Sometimes Tell More Than Images They See0
Image Semantic Relation Generation0
Image-to-Text for Medical Reports Using Adaptive Co-Attention and Triple-LSTM Module0
Image-to-Text Logic Jailbreak: Your Imagination can Help You Do Anything0
Improving Factuality of 3D Brain MRI Report Generation with Paired Image-domain Retrieval and Text-domain Augmentation0
Improving Medical Visual Representation Learning with Pathological-level Cross-Modal Alignment and Correlation Exploration0
Improving Table Structure Recognition with Visual-Alignment Sequential Coordinate Modeling0
Instruction Tuning-free Visual Token Complement for Multimodal LLMs0
Interpreting Vision and Language Generative Models with Semantic Visual Priors0
Is Cross-modal Information Retrieval Possible without Training?0
I See Dead People: Gray-Box Adversarial Attack on Image-To-Text Models0
Knowledge Aware Semantic Concept Expansion for Image-Text Matching0
Knowledge driven Description Synthesis for Floor Plan Interpretation0
Semantically Grounded QFormer for Efficient Vision Language Understanding0
Learning by Hallucinating: Vision-Language Pre-training with Weak Supervision0
Learning Deep Structure-Preserving Image-Text Embeddings0
Learning Pseudo-Labeler beyond Noun Concepts for Open-Vocabulary Object Detection0
Leveraging AI to Generate Audio for User-generated Content in Video Games0
Leveraging Unpaired Data for Vision-Language Generative Models via Cycle Consistency0
MedM2G: Unifying Medical Multi-Modal Generation via Cross-Guided Diffusion with Visual Invariant0
MFP-CLIP: Exploring the Efficacy of Multi-Form Prompts for Zero-Shot Industrial Anomaly Detection0
Category-Oriented Representation Learning for Image to Multi-Modal Retrieval0
Multilingual Image Corpus – Towards a Multimodal and Multilingual Dataset0
Multimodal Intelligence: Representation Learning, Information Fusion, and Applications0
Multimodal Neurons in Pretrained Text-Only Transformers0
Natural Language Generation0
Natural Language Generation from Visual Sequences: Challenges and Future Directions0
Offline Detection of Misspelled Handwritten Words by Convolving Recognition Model Features with Text Labels0
On the Importance of Text Preprocessing for Multimodal Representation Learning and Pathology Report Generation0
OVFoodSeg: Elevating Open-Vocabulary Food Image Segmentation via Image-Informed Textual Representation0
Paired Cross-Modal Data Augmentation for Fine-Grained Image-to-Text Retrieval0
Patch is Enough: Naturalistic Adversarial Patch against Vision-Language Pre-training Models0
PiTL: Cross-modal Retrieval with Weakly-supervised Vision-language Pre-training via Prompting0
RefineNet: Enhancing Text-to-Image Conversion with High-Resolution and Detail Accuracy through Hierarchical Transformers and Progressive Refinement0
Reinforced UI Instruction Grounding: Towards a Generic UI Task Automation API0
Reminding Multimodal Large Language Models of Object-aware Knowledge with Retrieved Tags0
Retaining Knowledge and Enhancing Long-Text Representations in CLIP through Dual-Teacher Distillation0
Retrieval-Augmented Multimodal Language Modeling0
Re-ViLM: Retrieval-Augmented Visual Language Model for Zero and Few-Shot Image Captioning0
Revisiting DETR Pre-training for Object Detection0
Robotic Environmental State Recognition with Pre-Trained Vision-Language Models and Black-Box Optimization0
Show:102550
← PrevPage 5 of 5Next →

No leaderboard results yet.