| Cross-Modal Adaptive Dual Association for Text-to-Image Person Retrieval | Dec 4, 2023 | AttributeCross-Modal Person Re-Identification | —Unverified | 0 |
| Learning Pseudo-Labeler beyond Noun Concepts for Open-Vocabulary Object Detection | Dec 4, 2023 | Image to textobject-detection | —Unverified | 0 |
| Pragmatic Radiology Report Generation | Nov 28, 2023 | Image to text | CodeCode Available | 0 |
| Safe-CLIP: Removing NSFW Concepts from Vision-and-Language Models | Nov 27, 2023 | Cross-Modal RetrievalImage Generation | CodeCode Available | 1 |
| Beyond Images: An Integrative Multi-modal Approach to Chest X-Ray Report Generation | Nov 18, 2023 | Image to textSemantic Similarity | —Unverified | 0 |
| AI Recommendation System for Enhanced Customer Experience: A Novel Image-to-Text Method | Nov 16, 2023 | Image to textObject | —Unverified | 0 |
| Efficient End-to-End Visual Document Understanding with Rationale Distillation | Nov 16, 2023 | document understandingImage to text | —Unverified | 0 |
| Semantically Grounded QFormer for Efficient Vision Language Understanding | Nov 13, 2023 | DiversityImage to text | —Unverified | 0 |
| GPT-4V(ision) as a Generalist Evaluator for Vision-Language Tasks | Nov 2, 2023 | Image GenerationImage to text | —Unverified | 0 |
| UrbanCLIP: Learning Text-enhanced Urban Region Profiling with Contrastive Language-Image Pretraining from the Web | Oct 22, 2023 | Image to textLanguage Modeling | CodeCode Available | 1 |