SOTAVerified

Image to text

Papers

Showing 201225 of 246 papers

TitleStatusHype
Utilizing Resource-Rich Language Datasets for End-to-End Scene Text Recognition in Resource-Poor Languages0
Vision-Braille: An End-to-End Tool for Chinese Braille Image-to-Text Translation0
Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation0
When are Lemons Purple? The Concept Association Bias of Vision-Language Models0
X-Fusion: Introducing New Modality to Frozen Large Language Models0
15M Multimodal Facial Image-Text Dataset0
RefineNet: Enhancing Text-to-Image Conversion with High-Resolution and Detail Accuracy through Hierarchical Transformers and Progressive Refinement0
Reinforced UI Instruction Grounding: Towards a Generic UI Task Automation API0
Reminding Multimodal Large Language Models of Object-aware Knowledge with Retrieved Tags0
Retaining Knowledge and Enhancing Long-Text Representations in CLIP through Dual-Teacher Distillation0
Retrieval-Augmented Multimodal Language Modeling0
PromptHash: Affinity-Prompted Collaborative Cross-Modal Learning for Adaptive Hashing RetrievalCode0
PromptHash:Affinity-Prompted Collaborative Cross-Modal Learning for Adaptive Hashing RetrievalCode0
Reading the unreadable: Creating a dataset of 19th century English newspapers using image-to-text language modelsCode0
Real-world validation of a multimodal LLM-powered pipeline for High-Accuracy Clinical Trial Patient Matching leveraging EHR dataCode0
MirrorGAN: Learning Text-to-image Generation by RedescriptionCode0
CLIP-based Synergistic Knowledge Transfer for Text-based Person RetrievalCode0
Characterizing and Understanding the Behavior of Quantized Models for Reliable DeploymentCode0
Probing Multimodal Large Language Models for Global and Local Semantic RepresentationsCode0
UniMoCo: Unified Modality Completion for Robust Multi-Modal EmbeddingsCode0
Delving into the Openness of CLIPCode0
Text-to-Image-to-Text Translation using Cycle Consistent Adversarial NetworksCode0
Revealing and Reducing Gender Biases in Vision and Language Assistants (VLAs)Code0
Align before Search: Aligning Ads Image to Text for Accurate Cross-Modal Sponsored SearchCode0
Adaptively Clustering Neighbor Elements for Image-Text GenerationCode0
Show:102550
← PrevPage 9 of 10Next →

No leaderboard results yet.