| From Pixels to Prose: Advancing Multi-Modal Language Models for Remote Sensing | Nov 5, 2024 | Change DetectionContrastive Learning | —Unverified | 0 |
| Robotic State Recognition with Image-to-Text Retrieval Task of Pre-Trained Vision-Language Model and Black-Box Optimization | Oct 30, 2024 | Image to textImage-to-Text Retrieval | —Unverified | 0 |
| Semantic Editing Increment Benefits Zero-Shot Composed Image Retrieval | Oct 28, 2024 | Image RetrievalImage to text | CodeCode Available | 2 |
| Revealing and Reducing Gender Biases in Vision and Language Assistants (VLAs) | Oct 25, 2024 | AttributeImage to text | CodeCode Available | 0 |
| Beyond Color and Lines: Zero-Shot Style-Specific Image Variations with Coordinated Semantics | Oct 24, 2024 | Image to textImage-Variation | —Unverified | 0 |
| Synergistic Dual Spatial-aware Generation of Image-to-Text and Text-to-Image | Oct 20, 2024 | Image to text | —Unverified | 0 |
| An Online Learning Approach to Prompt-based Selection of Generative Models | Oct 17, 2024 | Image to text | —Unverified | 0 |
| Patch is Enough: Naturalistic Adversarial Patch against Vision-Language Pre-training Models | Oct 7, 2024 | Image to text | —Unverified | 0 |
| Backdooring Vision-Language Models with Out-Of-Distribution Data | Oct 2, 2024 | Image CaptioningImage to text | —Unverified | 0 |
| See then Tell: Enhancing Key Information Extraction with Vision Grounding | Sep 29, 2024 | Image to textKey Information Extraction | —Unverified | 0 |