SOTAVerified

Image to text

Papers

Showing 201246 of 246 papers

TitleStatusHype
Utilizing Resource-Rich Language Datasets for End-to-End Scene Text Recognition in Resource-Poor Languages0
Vision-Braille: An End-to-End Tool for Chinese Braille Image-to-Text Translation0
Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation0
When are Lemons Purple? The Concept Association Bias of Vision-Language Models0
X-Fusion: Introducing New Modality to Frozen Large Language Models0
15M Multimodal Facial Image-Text Dataset0
RefineNet: Enhancing Text-to-Image Conversion with High-Resolution and Detail Accuracy through Hierarchical Transformers and Progressive Refinement0
Reinforced UI Instruction Grounding: Towards a Generic UI Task Automation API0
Pragmatic Radiology Report GenerationCode0
Probing Multimodal Large Language Models for Global and Local Semantic RepresentationsCode0
MultiQG-TI: Towards Question Generation from Multi-modal SourcesCode0
PromptHash: Affinity-Prompted Collaborative Cross-Modal Learning for Adaptive Hashing RetrievalCode0
PromptHash:Affinity-Prompted Collaborative Cross-Modal Learning for Adaptive Hashing RetrievalCode0
Reading the unreadable: Creating a dataset of 19th century English newspapers using image-to-text language modelsCode0
Real-world validation of a multimodal LLM-powered pipeline for High-Accuracy Clinical Trial Patient Matching leveraging EHR dataCode0
GABInsight: Exploring Gender-Activity Binding Bias in Vision-Language ModelsCode0
CLIP-based Synergistic Knowledge Transfer for Text-based Person RetrievalCode0
Characterizing and Understanding the Behavior of Quantized Models for Reliable DeploymentCode0
Multi-modality Regional Alignment Network for Covid X-Ray Survival Prediction and Report GenerationCode0
UniMoCo: Unified Modality Completion for Robust Multi-Modal EmbeddingsCode0
Delving into the Openness of CLIPCode0
Text-to-Image-to-Text Translation using Cycle Consistent Adversarial NetworksCode0
Revealing and Reducing Gender Biases in Vision and Language Assistants (VLAs)Code0
Align before Search: Aligning Ads Image to Text for Accurate Cross-Modal Sponsored SearchCode0
Adaptively Clustering Neighbor Elements for Image-Text GenerationCode0
Towards a text-based quantitative and explainable histopathology image analysisCode0
A Gentle Tutorial of Recurrent Neural Network with Error BackpropagationCode0
Face2Text: Collecting an Annotated Image Description Corpus for the Generation of Rich Face DescriptionsCode0
RoCOCO: Robustness Benchmark of MS-COCO to Stress-test Image-Text Matching ModelsCode0
Multi-LLM Collaborative Caption Generation in Scientific DocumentsCode0
MirrorGAN: Learning Text-to-image Generation by RedescriptionCode0
MatCha: Enhancing Visual Language Pretraining with Math Reasoning and Chart DerenderingCode0
Effective Use of Word Order for Text Categorization with Convolutional Neural NetworksCode0
Self-Supervised Image-to-Text and Text-to-Image SynthesisCode0
Improving the Factual Correctness of Radiology Report Generation with Semantic RewardsCode0
BiVLC: Extending Vision-Language Compositionality Evaluation with Text-to-Image RetrievalCode0
Exploration into Translation-Equivariant Image QuantizationCode0
Zero-shot Nuclei Detection via Visual-Language Pre-trained ModelsCode0
VISLA Benchmark: Evaluating Embedding Sensitivity to Semantic and Lexical AlterationsCode0
A Data-Driven Guided Decoding Mechanism for Diagnostic CaptioningCode0
SpatialVOC2K: A Multilingual Dataset of Images with Annotations and Features for Spatial Relations between ObjectsCode0
Benchmarking Vision-Language Contrastive Methods for Medical Representation LearningCode0
Aligning Multilingual Word Embeddings for Cross-Modal Retrieval TaskCode0
CoCoT: Contrastive Chain-of-Thought Prompting for Large Multimodal Models with Multiple Image InputsCode0
Survey on Abstractive Text Summarization: Dataset, Models, and MetricsCode0
CLIP-FSAC++: Few-Shot Anomaly Classification with Anomaly Descriptor Based on CLIPCode0
Show:102550
← PrevPage 5 of 5Next →

No leaderboard results yet.