SOTAVerified

Image to text

Papers

Showing 5175 of 246 papers

TitleStatusHype
Multimodal Procedural Planning via Dual Text-Image PromptingCode1
MAGVLT: Masked Generative Vision-and-Language TransformerCode1
ZeroNLG: Aligning and Autoencoding Domains for Zero-Shot Multimodal and Multilingual Natural Language GenerationCode1
Towards Unifying Medical Vision-and-Language Pre-training via Soft PromptsCode1
Language Quantized AutoEncoders: Towards Unsupervised Text-Image AlignmentCode1
Safe Latent Diffusion: Mitigating Inappropriate Degeneration in Diffusion ModelsCode1
Visual Spatial Description: Controlled Spatial-Oriented Image-to-Text GenerationCode1
Linearly Mapping from Image to Text SpaceCode1
FETA: Towards Specializing Foundation Models for Expert Task ApplicationsCode1
What is Where by Looking: Weakly-Supervised Open-World Phrase-Grounding without Text InputsCode1
Write and Paint: Generative Vision-Language Models are Unified Modal LearnersCode1
ERNIE-ViLG: Unified Generative Pre-training for Bidirectional Vision-Language GenerationCode1
Distilled Dual-Encoder Model for Vision-Language UnderstandingCode1
ZeroCap: Zero-Shot Image-to-Text Generation for Visual-Semantic ArithmeticCode1
L-Verse: Bidirectional Generation Between Image and TextCode1
Unifying Multimodal Transformer for Bi-directional Image and Text GenerationCode1
Concadia: Towards Image-Based Text Generation with a PurposeCode1
Progressive Transformer-Based Generation of Radiology ReportsCode1
Improving Factual Completeness and Consistency of Image-to-Text Radiology Report GenerationCode1
Improving Medical Visual Representation Learning with Pathological-level Cross-Modal Alignment and Correlation Exploration0
ChartReasoner: Code-Driven Modality Bridging for Long-Chain Reasoning in Chart Question Answering0
TNG-CLIP:Training-Time Negation Data Generation for Negation Awareness of CLIP0
BRIT: Bidirectional Retrieval over Unified Image-Text Graph0
Robustifying Vision-Language Models via Dynamic Token Reweighting0
UniMoCo: Unified Modality Completion for Robust Multi-Modal EmbeddingsCode0
Show:102550
← PrevPage 3 of 10Next →

No leaderboard results yet.