SOTAVerified

Image to text

Papers

Showing 101125 of 246 papers

TitleStatusHype
From Pixels to Prose: Advancing Multi-Modal Language Models for Remote Sensing0
Robotic State Recognition with Image-to-Text Retrieval Task of Pre-Trained Vision-Language Model and Black-Box Optimization0
Revealing and Reducing Gender Biases in Vision and Language Assistants (VLAs)Code0
Beyond Color and Lines: Zero-Shot Style-Specific Image Variations with Coordinated Semantics0
Synergistic Dual Spatial-aware Generation of Image-to-Text and Text-to-Image0
An Online Learning Approach to Prompt-based Selection of Generative Models0
Patch is Enough: Naturalistic Adversarial Patch against Vision-Language Pre-training Models0
Backdooring Vision-Language Models with Out-Of-Distribution Data0
See then Tell: Enhancing Key Information Extraction with Vision Grounding0
TrojVLM: Backdoor Attack Against Vision Language Models0
Robotic Environmental State Recognition with Pre-Trained Vision-Language Models and Black-Box Optimization0
Evaluating authenticity and quality of image captions via sentiment and semantic analyses0
Ask, Attend, Attack: A Effective Decision-Based Black-Box Targeted Attack for Image-to-Text Models0
Instruction Tuning-free Visual Token Complement for Multimodal LLMs0
GABInsight: Exploring Gender-Activity Binding Bias in Vision-Language ModelsCode0
Dallah: A Dialect-Aware Multimodal Large Language Model for Arabic0
GPC: Generative and General Pathology Image Classifier0
15M Multimodal Facial Image-Text Dataset0
Towards a text-based quantitative and explainable histopathology image analysisCode0
Vision-Braille: An End-to-End Tool for Chinese Braille Image-to-Text Translation0
HyCIR: Boosting Zero-Shot Composed Image Retrieval with Synthetic Labels0
Image-to-Text Logic Jailbreak: Your Imagination can Help You Do Anything0
A Data-Driven Guided Decoding Mechanism for Diagnostic CaptioningCode0
Reminding Multimodal Large Language Models of Object-aware Knowledge with Retrieved Tags0
BiVLC: Extending Vision-Language Compositionality Evaluation with Text-to-Image RetrievalCode0
Show:102550
← PrevPage 5 of 10Next →

No leaderboard results yet.