SOTAVerified

Image to text

Papers

Showing 5175 of 246 papers

TitleStatusHype
Language Quantized AutoEncoders: Towards Unsupervised Text-Image AlignmentCode1
Improving Image Restoration through Removing Degradations in Textual RepresentationsCode1
Multimodal Procedural Planning via Dual Text-Image PromptingCode1
ObjectCompose: Evaluating Resilience of Vision-Based Models on Object-to-Background Compositional ChangesCode1
Efficient Medical Vision-Language Alignment Through Adapting Masked Vision ModelsCode1
See or Guess: Counterfactually Regularized Image CaptioningCode1
MAGVLT: Masked Generative Vision-and-Language TransformerCode1
Write and Paint: Generative Vision-Language Models are Unified Modal LearnersCode1
DoraCycle: Domain-Oriented Adaptation of Unified Generative Model in Multimodal CyclesCode1
Progressive Transformer-Based Generation of Radiology ReportsCode1
Brain Captioning: Decoding human brain activity into images and textCode1
Distilled Dual-Encoder Model for Vision-Language UnderstandingCode1
Safe Latent Diffusion: Mitigating Inappropriate Degeneration in Diffusion ModelsCode1
FETA: Towards Specializing Foundation Models for Expert Task ApplicationsCode1
Can MLLMs Perform Text-to-Image In-Context Learning?Code1
ERNIE-ViLG: Unified Generative Pre-training for Bidirectional Vision-Language GenerationCode1
Unifying Multimodal Transformer for Bi-directional Image and Text GenerationCode1
FLAME: Frozen Large Language Models Enable Data-Efficient Language-Image Pre-trainingCode1
Generalizing from SIMPLE to HARD Visual Reasoning: Can We Mitigate Modality Imbalance in VLMs?Code1
PromptHash:Affinity-Prompted Collaborative Cross-Modal Learning for Adaptive Hashing RetrievalCode0
PromptHash: Affinity-Prompted Collaborative Cross-Modal Learning for Adaptive Hashing RetrievalCode0
Reading the unreadable: Creating a dataset of 19th century English newspapers using image-to-text language modelsCode0
BiVLC: Extending Vision-Language Compositionality Evaluation with Text-to-Image RetrievalCode0
Pragmatic Radiology Report GenerationCode0
Probing Multimodal Large Language Models for Global and Local Semantic RepresentationsCode0
Show:102550
← PrevPage 3 of 10Next →

No leaderboard results yet.