SOTAVerified

Image to text

Papers

Showing 2650 of 246 papers

TitleStatusHype
UniCMs: A Unified Consistency Model For Efficient Multimodal Generation and UnderstandingCode1
Generalizing from SIMPLE to HARD Visual Reasoning: Can We Mitigate Modality Imbalance in VLMs?Code1
FLAME: Frozen Large Language Models Enable Data-Efficient Language-Image Pre-trainingCode1
See or Guess: Counterfactually Regularized Image CaptioningCode1
UniFashion: A Unified Vision-Language Model for Multimodal Fashion Retrieval and GenerationCode1
CMC-Bench: Towards a New Paradigm of Visual Signal CompressionCode1
Cephalo: Multi-Modal Vision-Language Models for Bio-Inspired Materials Analysis and DesignCode1
Language-Oriented Semantic Latent Representation for Image TransmissionCode1
LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-Text Generation?Code1
ObjectCompose: Evaluating Resilience of Vision-Based Models on Object-to-Background Compositional ChangesCode1
Can MLLMs Perform Text-to-Image In-Context Learning?Code1
Benchmarking Large Multimodal Models against Common CorruptionsCode1
Improving Image Restoration through Removing Degradations in Textual RepresentationsCode1
Safe-CLIP: Removing NSFW Concepts from Vision-and-Language ModelsCode1
UrbanCLIP: Learning Text-enhanced Urban Region Profiling with Contrastive Language-Image Pretraining from the WebCode1
Symmetrical Linguistic Feature Distillation with CLIP for Scene Text RecognitionCode1
Multimodal Foundation Models For Echocardiogram InterpretationCode1
Beyond One-to-One: Rethinking the Referring Image SegmentationCode1
Vision-Language Dataset DistillationCode1
Unifying Two-Stream Encoders with Transformers for Cross-Modal RetrievalCode1
Transferable Decoding with Visual Entities for Zero-Shot Image CaptioningCode1
PRIOR: Prototype Representation Joint Learning from Medical Images and ReportsCode1
Bootstrapping Vision-Language Learning with Decoupled Language Pre-trainingCode1
Brain Captioning: Decoding human brain activity into images and textCode1
What You See is What You Read? Improving Text-Image Alignment EvaluationCode1
Show:102550
← PrevPage 2 of 10Next →

No leaderboard results yet.