| GAFNet: A Global Fourier Self Attention Based Novel Network for multi-modal downstream tasks | Jan 1, 2023 | Image GenerationImage-text Retrieval | —Unverified | 0 |
| Efficient Image Captioning for Edge Devices | Dec 18, 2022 | CPUImage Captioning | —Unverified | 0 |
| HGAN: Hierarchical Graph Alignment Network for Image-Text Retrieval | Dec 16, 2022 | Image-text RetrievalRetrieval | —Unverified | 0 |
| FlexiViT: One Model for All Patch Sizes | Dec 15, 2022 | AllImage-text Retrieval | CodeCode Available | 1 |
| Benchmarking Robustness of Multimodal Image-Text Models under Distribution Shift | Dec 15, 2022 | BenchmarkingImage Captioning | CodeCode Available | 1 |
| NLIP: Noise-robust Language-Image Pre-training | Dec 14, 2022 | Image CaptioningImage-text Retrieval | —Unverified | 0 |
| Scale-Semantic Joint Decoupling Network for Image-text Retrieval in Remote Sensing | Dec 12, 2022 | Cross-Modal RetrievalImage-text Retrieval | —Unverified | 0 |
| Masked Contrastive Pre-Training for Efficient Video-Text Retrieval | Dec 2, 2022 | Image-text RetrievalRetrieval | —Unverified | 0 |
| ComCLIP: Training-Free Compositional Image and Text Matching | Nov 25, 2022 | Image-text matchingImage-text Retrieval | CodeCode Available | 1 |
| Seeing What You Miss: Vision-Language Pre-training with Semantic Completion Learning | Nov 24, 2022 | cross-modal alignmentImage-text Retrieval | CodeCode Available | 1 |