SOTAVerified

Image to text

Papers

Showing 51100 of 246 papers

TitleStatusHype
LRSCLIP: A Vision-Language Foundation Model for Aligning Remote Sensing Image with Longer TextCode1
MAGVLT: Masked Generative Vision-and-Language TransformerCode1
Language Quantized AutoEncoders: Towards Unsupervised Text-Image AlignmentCode1
What is Where by Looking: Weakly-Supervised Open-World Phrase-Grounding without Text InputsCode1
Linearly Mapping from Image to Text SpaceCode1
Improving Factual Completeness and Consistency of Image-to-Text Radiology Report GenerationCode1
Improving Image Restoration through Removing Degradations in Textual RepresentationsCode1
Towards Unifying Medical Vision-and-Language Pre-training via Soft PromptsCode1
Bootstrapping Vision-Language Learning with Decoupled Language Pre-trainingCode1
UniCMs: A Unified Consistency Model For Efficient Multimodal Generation and UnderstandingCode1
Brain Captioning: Decoding human brain activity into images and textCode1
Distilled Dual-Encoder Model for Vision-Language UnderstandingCode1
LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-Text Generation?Code1
Visual Spatial Description: Controlled Spatial-Oriented Image-to-Text GenerationCode1
Can MLLMs Perform Text-to-Image In-Context Learning?Code1
What You See is What You Read? Improving Text-Image Alignment EvaluationCode1
DoraCycle: Domain-Oriented Adaptation of Unified Generative Model in Multimodal CyclesCode1
Language-Oriented Semantic Latent Representation for Image TransmissionCode1
Safe-CLIP: Removing NSFW Concepts from Vision-and-Language ModelsCode1
Text-to-Image-to-Text Translation using Cycle Consistent Adversarial NetworksCode0
BiVLC: Extending Vision-Language Compositionality Evaluation with Text-to-Image RetrievalCode0
Towards a text-based quantitative and explainable histopathology image analysisCode0
Exploration into Translation-Equivariant Image QuantizationCode0
UniMoCo: Unified Modality Completion for Robust Multi-Modal EmbeddingsCode0
Self-Supervised Image-to-Text and Text-to-Image SynthesisCode0
SpatialVOC2K: A Multilingual Dataset of Images with Annotations and Features for Spatial Relations between ObjectsCode0
A Data-Driven Guided Decoding Mechanism for Diagnostic CaptioningCode0
Survey on Abstractive Text Summarization: Dataset, Models, and MetricsCode0
Revealing and Reducing Gender Biases in Vision and Language Assistants (VLAs)Code0
Delving into the Openness of CLIPCode0
RoCOCO: Robustness Benchmark of MS-COCO to Stress-test Image-Text Matching ModelsCode0
Aligning Multilingual Word Embeddings for Cross-Modal Retrieval TaskCode0
Real-world validation of a multimodal LLM-powered pipeline for High-Accuracy Clinical Trial Patient Matching leveraging EHR dataCode0
CoCoT: Contrastive Chain-of-Thought Prompting for Large Multimodal Models with Multiple Image InputsCode0
Probing Multimodal Large Language Models for Global and Local Semantic RepresentationsCode0
Align before Search: Aligning Ads Image to Text for Accurate Cross-Modal Sponsored SearchCode0
Pragmatic Radiology Report GenerationCode0
PromptHash: Affinity-Prompted Collaborative Cross-Modal Learning for Adaptive Hashing RetrievalCode0
Adaptively Clustering Neighbor Elements for Image-Text GenerationCode0
Benchmarking Vision-Language Contrastive Methods for Medical Representation LearningCode0
CLIP-FSAC++: Few-Shot Anomaly Classification with Anomaly Descriptor Based on CLIPCode0
GABInsight: Exploring Gender-Activity Binding Bias in Vision-Language ModelsCode0
Face2Text: Collecting an Annotated Image Description Corpus for the Generation of Rich Face DescriptionsCode0
MultiQG-TI: Towards Question Generation from Multi-modal SourcesCode0
CLIP-based Synergistic Knowledge Transfer for Text-based Person RetrievalCode0
PromptHash:Affinity-Prompted Collaborative Cross-Modal Learning for Adaptive Hashing RetrievalCode0
MirrorGAN: Learning Text-to-image Generation by RedescriptionCode0
Multi-LLM Collaborative Caption Generation in Scientific DocumentsCode0
Characterizing and Understanding the Behavior of Quantized Models for Reliable DeploymentCode0
MatCha: Enhancing Visual Language Pretraining with Math Reasoning and Chart DerenderingCode0
Show:102550
← PrevPage 2 of 5Next →

No leaderboard results yet.