| FETA: Towards Specializing Foundation Models for Expert Task Applications | Sep 8, 2022 | Domain GeneralizationFew-Shot Learning | CodeCode Available | 1 |
| Every picture tells a story: Image-grounded controllable stylistic story generation | Sep 4, 2022 | Image CaptioningImage to text | —Unverified | 0 |
| Discovering Bugs in Vision Models using Off-the-shelf Image Generation and Captioning | Aug 18, 2022 | Image GenerationImage to text | —Unverified | 0 |
| Paired Cross-Modal Data Augmentation for Fine-Grained Image-to-Text Retrieval | Jul 29, 2022 | Cross-Modal RetrievalData Augmentation | —Unverified | 0 |
| SRCB at SemEval-2022 Task 5: Pretraining Based Image to Text Late Sequential Fusion System for Multimodal Misogynous Meme Identification | Jul 1, 2022 | Image to text | —Unverified | 0 |
| What is Where by Looking: Weakly-Supervised Open-World Phrase-Grounding without Text Inputs | Jun 19, 2022 | BenchmarkingImage Captioning | CodeCode Available | 1 |
| Write and Paint: Generative Vision-Language Models are Unified Modal Learners | Jun 15, 2022 | Image GenerationImage to text | CodeCode Available | 1 |
| Delving into the Openness of CLIP | Jun 4, 2022 | image-classificationImage Classification | CodeCode Available | 0 |
| Multilingual Image Corpus – Towards a Multimodal and Multilingual Dataset | Jun 1, 2022 | Caption Generationimage-classification | —Unverified | 0 |
| GIT: A Generative Image-to-text Transformer for Vision and Language | May 27, 2022 | DecoderImage Captioning | CodeCode Available | 2 |