SOTAVerified

Image to text

Papers

Showing 76100 of 246 papers

TitleStatusHype
SpatialVOC2K: A Multilingual Dataset of Images with Annotations and Features for Spatial Relations between ObjectsCode0
A Data-Driven Guided Decoding Mechanism for Diagnostic CaptioningCode0
Survey on Abstractive Text Summarization: Dataset, Models, and MetricsCode0
Revealing and Reducing Gender Biases in Vision and Language Assistants (VLAs)Code0
Delving into the Openness of CLIPCode0
RoCOCO: Robustness Benchmark of MS-COCO to Stress-test Image-Text Matching ModelsCode0
Aligning Multilingual Word Embeddings for Cross-Modal Retrieval TaskCode0
Real-world validation of a multimodal LLM-powered pipeline for High-Accuracy Clinical Trial Patient Matching leveraging EHR dataCode0
CoCoT: Contrastive Chain-of-Thought Prompting for Large Multimodal Models with Multiple Image InputsCode0
Probing Multimodal Large Language Models for Global and Local Semantic RepresentationsCode0
Align before Search: Aligning Ads Image to Text for Accurate Cross-Modal Sponsored SearchCode0
Pragmatic Radiology Report GenerationCode0
PromptHash: Affinity-Prompted Collaborative Cross-Modal Learning for Adaptive Hashing RetrievalCode0
Adaptively Clustering Neighbor Elements for Image-Text GenerationCode0
Benchmarking Vision-Language Contrastive Methods for Medical Representation LearningCode0
CLIP-FSAC++: Few-Shot Anomaly Classification with Anomaly Descriptor Based on CLIPCode0
GABInsight: Exploring Gender-Activity Binding Bias in Vision-Language ModelsCode0
Face2Text: Collecting an Annotated Image Description Corpus for the Generation of Rich Face DescriptionsCode0
MultiQG-TI: Towards Question Generation from Multi-modal SourcesCode0
CLIP-based Synergistic Knowledge Transfer for Text-based Person RetrievalCode0
PromptHash:Affinity-Prompted Collaborative Cross-Modal Learning for Adaptive Hashing RetrievalCode0
MirrorGAN: Learning Text-to-image Generation by RedescriptionCode0
Multi-LLM Collaborative Caption Generation in Scientific DocumentsCode0
Characterizing and Understanding the Behavior of Quantized Models for Reliable DeploymentCode0
MatCha: Enhancing Visual Language Pretraining with Math Reasoning and Chart DerenderingCode0
Show:102550
← PrevPage 4 of 10Next →

No leaderboard results yet.