| Ziya-Visual: Bilingual Large Vision-Language Model via Multi-Task Instruction Tuning | Oct 12, 2023 | Image CaptioningImage-text Retrieval | —Unverified | 0 |
| SingleInsert: Inserting New Concepts from a Single Image into Text-to-Image Models for Flexible Editing | Oct 12, 2023 | Image GenerationImage to text | —Unverified | 0 |
| Symmetrical Linguistic Feature Distillation with CLIP for Scene Text Recognition | Oct 8, 2023 | Image to textOptical Character Recognition (OCR) | CodeCode Available | 1 |
| Reinforced UI Instruction Grounding: Towards a Generic UI Task Automation API | Oct 7, 2023 | Decoderdocument understanding | —Unverified | 0 |
| Leveraging Unpaired Data for Vision-Language Generative Models via Cycle Consistency | Oct 5, 2023 | Image GenerationImage to text | —Unverified | 0 |
| Fine-grained Late-interaction Multi-modal Retrieval for Retrieval Augmented Visual Question Answering | Sep 29, 2023 | Image to textPassage Retrieval | CodeCode Available | 2 |
| Align before Search: Aligning Ads Image to Text for Accurate Cross-Modal Sponsored Search | Sep 28, 2023 | cross-modal alignmentCross-Modal Retrieval | CodeCode Available | 0 |
| SurrogatePrompt: Bypassing the Safety Filter of Text-to-Image Models via Substitution | Sep 25, 2023 | Image to text | —Unverified | 0 |
| Offline Detection of Misspelled Handwritten Words by Convolving Recognition Model Features with Text Labels | Sep 18, 2023 | Generative Adversarial NetworkHandwriting Recognition | —Unverified | 0 |
| CLIP-based Synergistic Knowledge Transfer for Text-based Person Retrieval | Sep 18, 2023 | Image to textPerson Retrieval | CodeCode Available | 0 |