| ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities | May 18, 2023 | 1 Image, 2*2 StitchiAction Classification | CodeCode Available | 3 |
| Region-Aware Pretraining for Open-Vocabulary Object Detection with Vision Transformers | May 11, 2023 | Contrastive LearningImage-text Retrieval | CodeCode Available | 1 |
| From Association to Generation: Text-only Captioning by Unsupervised Cross-modal Mapping | Apr 26, 2023 | DecoderImage Captioning | CodeCode Available | 1 |
| Hypernymization of named entity-rich captions for grounding-based multi-modal pretraining | Apr 25, 2023 | ArticlesImage-text Retrieval | —Unverified | 0 |
| Learnable Pillar-based Re-ranking for Image-Text Retrieval | Apr 25, 2023 | Image-text RetrievalRe-Ranking | CodeCode Available | 1 |
| Rethinking Benchmarks for Cross-modal Image-text Retrieval | Apr 21, 2023 | Cross-Modal RetrievalImage-text Retrieval | CodeCode Available | 1 |
| Image-text Retrieval via Preserving Main Semantics of Vision | Apr 20, 2023 | Cross-Modal RetrievalImage-text Retrieval | CodeCode Available | 1 |
| Hyperbolic Image-Text Representations | Apr 18, 2023 | image-classificationImage Classification | CodeCode Available | 1 |
| RECLIP: Resource-efficient CLIP by Training with Small Images | Apr 12, 2023 | Contrastive LearningImage-text Retrieval | —Unverified | 0 |
| Exposing and Mitigating Spurious Correlations for Cross-Modal Retrieval | Apr 6, 2023 | Cross-Modal RetrievalImage-text Retrieval | CodeCode Available | 0 |