| From Pixels to Prose: Advancing Multi-Modal Language Models for Remote Sensing | Nov 5, 2024 | Change DetectionContrastive Learning | —Unverified | 0 |
| Robotic State Recognition with Image-to-Text Retrieval Task of Pre-Trained Vision-Language Model and Black-Box Optimization | Oct 30, 2024 | Image to textImage-to-Text Retrieval | —Unverified | 0 |
| Revealing and Reducing Gender Biases in Vision and Language Assistants (VLAs) | Oct 25, 2024 | AttributeImage to text | CodeCode Available | 0 |
| Beyond Color and Lines: Zero-Shot Style-Specific Image Variations with Coordinated Semantics | Oct 24, 2024 | Image to textImage-Variation | —Unverified | 0 |
| Synergistic Dual Spatial-aware Generation of Image-to-Text and Text-to-Image | Oct 20, 2024 | Image to text | —Unverified | 0 |
| An Online Learning Approach to Prompt-based Selection of Generative Models | Oct 17, 2024 | Image to text | —Unverified | 0 |
| Patch is Enough: Naturalistic Adversarial Patch against Vision-Language Pre-training Models | Oct 7, 2024 | Image to text | —Unverified | 0 |
| Backdooring Vision-Language Models with Out-Of-Distribution Data | Oct 2, 2024 | Image CaptioningImage to text | —Unverified | 0 |
| See then Tell: Enhancing Key Information Extraction with Vision Grounding | Sep 29, 2024 | Image to textKey Information Extraction | —Unverified | 0 |
| TrojVLM: Backdoor Attack Against Vision Language Models | Sep 28, 2024 | Backdoor AttackImage Captioning | —Unverified | 0 |
| Robotic Environmental State Recognition with Pre-Trained Vision-Language Models and Black-Box Optimization | Sep 26, 2024 | Image to textImage-to-Text Retrieval | —Unverified | 0 |
| Evaluating authenticity and quality of image captions via sentiment and semantic analyses | Sep 14, 2024 | Image CaptioningImage to text | —Unverified | 0 |
| Ask, Attend, Attack: A Effective Decision-Based Black-Box Targeted Attack for Image-to-Text Models | Aug 16, 2024 | Image to text | —Unverified | 0 |
| Instruction Tuning-free Visual Token Complement for Multimodal LLMs | Aug 9, 2024 | Image GenerationImage to text | —Unverified | 0 |
| GABInsight: Exploring Gender-Activity Binding Bias in Vision-Language Models | Jul 30, 2024 | Image to textImage-to-Text Retrieval | CodeCode Available | 0 |
| Dallah: A Dialect-Aware Multimodal Large Language Model for Arabic | Jul 25, 2024 | Image to textLanguage Modeling | —Unverified | 0 |
| GPC: Generative and General Pathology Image Classifier | Jul 12, 2024 | Classificationimage-classification | —Unverified | 0 |
| 15M Multimodal Facial Image-Text Dataset | Jul 11, 2024 | Image to text | —Unverified | 0 |
| Towards a text-based quantitative and explainable histopathology image analysis | Jul 10, 2024 | image-classificationImage Classification | CodeCode Available | 0 |
| Vision-Braille: An End-to-End Tool for Chinese Braille Image-to-Text Translation | Jul 8, 2024 | Image to textLifelong learning | —Unverified | 0 |
| HyCIR: Boosting Zero-Shot Composed Image Retrieval with Synthetic Labels | Jul 8, 2024 | Contrastive LearningImage Retrieval | —Unverified | 0 |
| Image-to-Text Logic Jailbreak: Your Imagination can Help You Do Anything | Jul 1, 2024 | Image to textLanguage Modeling | —Unverified | 0 |
| A Data-Driven Guided Decoding Mechanism for Diagnostic Captioning | Jun 20, 2024 | DiagnosticImage to text | CodeCode Available | 0 |
| Reminding Multimodal Large Language Models of Object-aware Knowledge with Retrieved Tags | Jun 16, 2024 | Image to textInstruction Following | —Unverified | 0 |
| BiVLC: Extending Vision-Language Compositionality Evaluation with Text-to-Image Retrieval | Jun 14, 2024 | Image RetrievalImage to text | CodeCode Available | 0 |