SOTAVerified

Image to text

Papers

Showing 101150 of 246 papers

TitleStatusHype
Captions Are Worth a Thousand Words: Enhancing Product Retrieval with Pretrained Image-to-Text Models0
Can MLLMs Perform Text-to-Image In-Context Learning?Code1
Dynamic Traceback Learning for Medical Report Generation0
Benchmarking Large Multimodal Models against Common CorruptionsCode1
CoCoT: Contrastive Chain-of-Thought Prompting for Large Multimodal Models with Multiple Image InputsCode0
SyCoCa: Symmetrizing Contrastive Captioners with Attentive Masking for Multimodal Alignment0
Accept the Modality Gap: An Exploration in the Hyperbolic Space0
Improving Image Restoration through Removing Degradations in Textual RepresentationsCode1
RefineNet: Enhancing Text-to-Image Conversion with High-Resolution and Detail Accuracy through Hierarchical Transformers and Progressive Refinement0
DiffuVST: Narrating Fictional Scenes with Global-History-Guided Denoising Models0
Learning Pseudo-Labeler beyond Noun Concepts for Open-Vocabulary Object Detection0
Cross-Modal Adaptive Dual Association for Text-to-Image Person Retrieval0
Pragmatic Radiology Report GenerationCode0
Safe-CLIP: Removing NSFW Concepts from Vision-and-Language ModelsCode1
Beyond Images: An Integrative Multi-modal Approach to Chest X-Ray Report Generation0
Efficient End-to-End Visual Document Understanding with Rationale Distillation0
AI Recommendation System for Enhanced Customer Experience: A Novel Image-to-Text Method0
Semantically Grounded QFormer for Efficient Vision Language Understanding0
GPT-4V(ision) as a Generalist Evaluator for Vision-Language Tasks0
UrbanCLIP: Learning Text-enhanced Urban Region Profiling with Contrastive Language-Image Pretraining from the WebCode1
Ziya-Visual: Bilingual Large Vision-Language Model via Multi-Task Instruction Tuning0
SingleInsert: Inserting New Concepts from a Single Image into Text-to-Image Models for Flexible Editing0
Symmetrical Linguistic Feature Distillation with CLIP for Scene Text RecognitionCode1
Reinforced UI Instruction Grounding: Towards a Generic UI Task Automation API0
Leveraging Unpaired Data for Vision-Language Generative Models via Cycle Consistency0
Fine-grained Late-interaction Multi-modal Retrieval for Retrieval Augmented Visual Question AnsweringCode2
Align before Search: Aligning Ads Image to Text for Accurate Cross-Modal Sponsored SearchCode0
SurrogatePrompt: Bypassing the Safety Filter of Text-to-Image Models via Substitution0
Offline Detection of Misspelled Handwritten Words by Convolving Recognition Model Features with Text Labels0
CLIP-based Synergistic Knowledge Transfer for Text-based Person RetrievalCode0
BiLMa: Bidirectional Local-Matching for Text-based Person Re-identification0
Sequential Semantic Generative Communication for Progressive Text-to-Image Generation0
Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction TuningCode2
Multimodal Foundation Models For Echocardiogram InterpretationCode1
Beyond One-to-One: Rethinking the Referring Image SegmentationCode1
Large Multilingual Models Pivot Zero-Shot Multimodal Learning across LanguagesCode6
GrowCLIP: Data-aware Automatic Model Growing for Large-scale Contrastive Language-Image Pre-training0
Vision-Language Dataset DistillationCode1
Unifying Two-Stream Encoders with Transformers for Cross-Modal RetrievalCode1
Multimodal Neurons in Pretrained Text-Only Transformers0
Revisiting DETR Pre-training for Object Detection0
Transferable Decoding with Visual Entities for Zero-Shot Image CaptioningCode1
PRIOR: Prototype Representation Joint Learning from Medical Images and ReportsCode1
Towards a Visual-Language Foundation Model for Computational Pathology0
Planting a SEED of Vision in Large Language ModelCode2
PiTL: Cross-modal Retrieval with Weakly-supervised Vision-language Pre-training via Prompting0
Bootstrapping Vision-Language Learning with Decoupled Language Pre-trainingCode1
Emu: Generative Pretraining in MultimodalityCode3
MultiQG-TI: Towards Question Generation from Multi-modal SourcesCode0
Zero-shot Nuclei Detection via Visual-Language Pre-trained ModelsCode0
Show:102550
← PrevPage 3 of 5Next →

No leaderboard results yet.