| Prototype-based Aleatoric Uncertainty Quantification for Cross-modal Retrieval | Sep 29, 2023 | Cross-Modal RetrievalImage-text matching | CodeCode Available | 1 |
| CLIP2Video: Mastering Video-Text Retrieval via Image CLIP | Jun 21, 2021 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| SignCLIP: Connecting Text and Sign Language by Contrastive Learning | Jul 1, 2024 | Contrastive LearningRetrieval | CodeCode Available | 1 |
| Bridging Video-text Retrieval with Multiple Choice Questions | Jan 13, 2022 | Action RecognitionLinear evaluation | CodeCode Available | 1 |
| Learning a Text-Video Embedding from Incomplete and Heterogeneous Data | Apr 7, 2018 | RetrievalText Retrieval | CodeCode Available | 1 |
| MILES: Visual BERT Pre-training with Injected Language Semantics for Video-text Retrieval | Apr 26, 2022 | Action RecognitionRetrieval | CodeCode Available | 1 |
| COM Kitchens: An Unedited Overhead-view Video Dataset as a Vision-Language Benchmark | Aug 5, 2024 | Dense Video CaptioningDiversity | CodeCode Available | 1 |
| VideoCoCa: Video-Text Modeling with Zero-Shot Transfer from Contrastive Captioners | Dec 9, 2022 | Question AnsweringRetrieval | —Unverified | 0 |
| i-Code Studio: A Configurable and Composable Framework for Integrative AI | May 23, 2023 | Question AnsweringRetrieval | —Unverified | 0 |
| Sakuga-42M Dataset: Scaling Up Cartoon Research | May 13, 2024 | MambaText to Video Retrieval | —Unverified | 0 |