| Image Semantic Relation Generation | Oct 19, 2022 | Image RetrievalImage Segmentation | —Unverified | 0 |
| Re-ViLM: Retrieval-Augmented Visual Language Model for Zero and Few-Shot Image Captioning | Feb 9, 2023 | Few-Shot LearningImage Captioning | —Unverified | 0 |
| Revisiting DETR Pre-training for Object Detection | Aug 2, 2023 | Image to textObject | —Unverified | 0 |
| Robotic Environmental State Recognition with Pre-Trained Vision-Language Models and Black-Box Optimization | Sep 26, 2024 | Image to textImage-to-Text Retrieval | —Unverified | 0 |
| Robotic State Recognition with Image-to-Text Retrieval Task of Pre-Trained Vision-Language Model and Black-Box Optimization | Oct 30, 2024 | Image to textImage-to-Text Retrieval | —Unverified | 0 |
| Robustifying Vision-Language Models via Dynamic Token Reweighting | May 22, 2025 | Image to text | —Unverified | 0 |
| See then Tell: Enhancing Key Information Extraction with Vision Grounding | Sep 29, 2024 | Image to textKey Information Extraction | —Unverified | 0 |
| SemCORE: A Semantic-Enhanced Generative Cross-Modal Retrieval Framework with MLLMs | Apr 17, 2025 | Cross-Modal RetrievalImage Retrieval | —Unverified | 0 |
| Sequential Semantic Generative Communication for Progressive Text-to-Image Generation | Sep 8, 2023 | Image GenerationImage to text | —Unverified | 0 |
| SingleInsert: Inserting New Concepts from a Single Image into Text-to-Image Models for Flexible Editing | Oct 12, 2023 | Image GenerationImage to text | —Unverified | 0 |