SOTAVerified

Image to text

Papers

Showing 5175 of 246 papers

TitleStatusHype
Symmetrical Linguistic Feature Distillation with CLIP for Scene Text RecognitionCode1
Towards Unifying Medical Vision-and-Language Pre-training via Soft PromptsCode1
UniCMs: A Unified Consistency Model For Efficient Multimodal Generation and UnderstandingCode1
UniFashion: A Unified Vision-Language Model for Multimodal Fashion Retrieval and GenerationCode1
Language Quantized AutoEncoders: Towards Unsupervised Text-Image AlignmentCode1
Efficient Medical Vision-Language Alignment Through Adapting Masked Vision ModelsCode1
Linearly Mapping from Image to Text SpaceCode1
What You See is What You Read? Improving Text-Image Alignment EvaluationCode1
Improving Factual Completeness and Consistency of Image-to-Text Radiology Report GenerationCode1
DoraCycle: Domain-Oriented Adaptation of Unified Generative Model in Multimodal CyclesCode1
LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-Text Generation?Code1
Distilled Dual-Encoder Model for Vision-Language UnderstandingCode1
Can MLLMs Perform Text-to-Image In-Context Learning?Code1
Generalizing from SIMPLE to HARD Visual Reasoning: Can We Mitigate Modality Imbalance in VLMs?Code1
FETA: Towards Specializing Foundation Models for Expert Task ApplicationsCode1
ERNIE-ViLG: Unified Generative Pre-training for Bidirectional Vision-Language GenerationCode1
Language-Oriented Semantic Latent Representation for Image TransmissionCode1
MAGVLT: Masked Generative Vision-and-Language TransformerCode1
Safe Latent Diffusion: Mitigating Inappropriate Degeneration in Diffusion ModelsCode1
Discovering Bugs in Vision Models using Off-the-shelf Image Generation and Captioning0
DIR: Retrieval-Augmented Image Captioning with Comprehensive Understanding0
DiffuVST: Narrating Fictional Scenes with Global-History-Guided Denoising Models0
Ask, Attend, Attack: A Effective Decision-Based Black-Box Targeted Attack for Image-to-Text Models0
DiffusionSTR: Diffusion Model for Scene Text Recognition0
Development of a New Image-to-text Conversion System for Pashto, Farsi and Traditional Chinese0
Show:102550
← PrevPage 3 of 10Next →

No leaderboard results yet.