Image to text

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 101–125 of 246 papers

Title	Date	Tasks	Status
From Pixels to Prose: Advancing Multi-Modal Language Models for Remote Sensing	Nov 5, 2024	Change DetectionContrastive Learning	—Unverified
Robotic State Recognition with Image-to-Text Retrieval Task of Pre-Trained Vision-Language Model and Black-Box Optimization	Oct 30, 2024	Image to textImage-to-Text Retrieval	—Unverified
Revealing and Reducing Gender Biases in Vision and Language Assistants (VLAs)	Oct 25, 2024	AttributeImage to text	CodeCode Available
Beyond Color and Lines: Zero-Shot Style-Specific Image Variations with Coordinated Semantics	Oct 24, 2024	Image to textImage-Variation	—Unverified
Synergistic Dual Spatial-aware Generation of Image-to-Text and Text-to-Image	Oct 20, 2024	Image to text	—Unverified
An Online Learning Approach to Prompt-based Selection of Generative Models	Oct 17, 2024	Image to text	—Unverified
Patch is Enough: Naturalistic Adversarial Patch against Vision-Language Pre-training Models	Oct 7, 2024	Image to text	—Unverified
Backdooring Vision-Language Models with Out-Of-Distribution Data	Oct 2, 2024	Image CaptioningImage to text	—Unverified
See then Tell: Enhancing Key Information Extraction with Vision Grounding	Sep 29, 2024	Image to textKey Information Extraction	—Unverified
TrojVLM: Backdoor Attack Against Vision Language Models	Sep 28, 2024	Backdoor AttackImage Captioning	—Unverified
Robotic Environmental State Recognition with Pre-Trained Vision-Language Models and Black-Box Optimization	Sep 26, 2024	Image to textImage-to-Text Retrieval	—Unverified
Evaluating authenticity and quality of image captions via sentiment and semantic analyses	Sep 14, 2024	Image CaptioningImage to text	—Unverified
Ask, Attend, Attack: A Effective Decision-Based Black-Box Targeted Attack for Image-to-Text Models	Aug 16, 2024	Image to text	—Unverified
Instruction Tuning-free Visual Token Complement for Multimodal LLMs	Aug 9, 2024	Image GenerationImage to text	—Unverified
GABInsight: Exploring Gender-Activity Binding Bias in Vision-Language Models	Jul 30, 2024	Image to textImage-to-Text Retrieval	CodeCode Available
Dallah: A Dialect-Aware Multimodal Large Language Model for Arabic	Jul 25, 2024	Image to textLanguage Modeling	—Unverified
GPC: Generative and General Pathology Image Classifier	Jul 12, 2024	Classificationimage-classification	—Unverified
15M Multimodal Facial Image-Text Dataset	Jul 11, 2024	Image to text	—Unverified
Towards a text-based quantitative and explainable histopathology image analysis	Jul 10, 2024	image-classificationImage Classification	CodeCode Available
Vision-Braille: An End-to-End Tool for Chinese Braille Image-to-Text Translation	Jul 8, 2024	Image to textLifelong learning	—Unverified
HyCIR: Boosting Zero-Shot Composed Image Retrieval with Synthetic Labels	Jul 8, 2024	Contrastive LearningImage Retrieval	—Unverified
Image-to-Text Logic Jailbreak: Your Imagination can Help You Do Anything	Jul 1, 2024	Image to textLanguage Modeling	—Unverified
A Data-Driven Guided Decoding Mechanism for Diagnostic Captioning	Jun 20, 2024	DiagnosticImage to text	CodeCode Available
Reminding Multimodal Large Language Models of Object-aware Knowledge with Retrieved Tags	Jun 16, 2024	Image to textInstruction Following	—Unverified
BiVLC: Extending Vision-Language Compositionality Evaluation with Text-to-Image Retrieval	Jun 14, 2024	Image RetrievalImage to text	CodeCode Available

Show:10 25 50

← PrevPage 5 of 10Next →

No leaderboard results yet.