| Prototype-based Aleatoric Uncertainty Quantification for Cross-modal Retrieval | Sep 29, 2023 | Cross-Modal RetrievalImage-text matching | CodeCode Available | 1 | 5 |
| Bridging Video-text Retrieval with Multiple Choice Questions | Jan 13, 2022 | Action RecognitionLinear evaluation | CodeCode Available | 1 | 5 |
| CLIP2Video: Mastering Video-Text Retrieval via Image CLIP | Jun 21, 2021 | Language ModelingLanguage Modelling | CodeCode Available | 1 | 5 |
| SignCLIP: Connecting Text and Sign Language by Contrastive Learning | Jul 1, 2024 | Contrastive LearningRetrieval | CodeCode Available | 1 | 5 |
| Learning a Text-Video Embedding from Incomplete and Heterogeneous Data | Apr 7, 2018 | RetrievalText Retrieval | CodeCode Available | 1 | 5 |
| MILES: Visual BERT Pre-training with Injected Language Semantics for Video-text Retrieval | Apr 26, 2022 | Action RecognitionRetrieval | CodeCode Available | 1 | 5 |
| COM Kitchens: An Unedited Overhead-view Video Dataset as a Vision-Language Benchmark | Aug 5, 2024 | Dense Video CaptioningDiversity | CodeCode Available | 1 | 5 |
| MSVD-Indonesian: A Benchmark for Multimodal Video-Text Tasks in Indonesian | Jun 20, 2023 | Cross-Lingual TransferRetrieval | CodeCode Available | 0 | 5 |
| Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language | Apr 1, 2022 | DiversityImage Captioning | CodeCode Available | 0 | 5 |
| i-Code Studio: A Configurable and Composable Framework for Integrative AI | May 23, 2023 | Question AnsweringRetrieval | —Unverified | 0 | 0 |