| Prototype-based Aleatoric Uncertainty Quantification for Cross-modal Retrieval | Sep 29, 2023 | Cross-Modal RetrievalImage-text matching | CodeCode Available | 1 |
| CLIP2Video: Mastering Video-Text Retrieval via Image CLIP | Jun 21, 2021 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| SignCLIP: Connecting Text and Sign Language by Contrastive Learning | Jul 1, 2024 | Contrastive LearningRetrieval | CodeCode Available | 1 |
| Bridging Video-text Retrieval with Multiple Choice Questions | Jan 13, 2022 | Action RecognitionLinear evaluation | CodeCode Available | 1 |
| Learning a Text-Video Embedding from Incomplete and Heterogeneous Data | Apr 7, 2018 | RetrievalText Retrieval | CodeCode Available | 1 |
| MILES: Visual BERT Pre-training with Injected Language Semantics for Video-text Retrieval | Apr 26, 2022 | Action RecognitionRetrieval | CodeCode Available | 1 |
| COM Kitchens: An Unedited Overhead-view Video Dataset as a Vision-Language Benchmark | Aug 5, 2024 | Dense Video CaptioningDiversity | CodeCode Available | 1 |
| VideoCoCa: Video-Text Modeling with Zero-Shot Transfer from Contrastive Captioners | Dec 9, 2022 | Question AnsweringRetrieval | —Unverified | 0 |
| i-Code Studio: A Configurable and Composable Framework for Integrative AI | May 23, 2023 | Question AnsweringRetrieval | —Unverified | 0 |
| Sakuga-42M Dataset: Scaling Up Cartoon Research | May 13, 2024 | MambaText to Video Retrieval | —Unverified | 0 |
| SPECTRUM: Semantic Processing and Emotion-informed video-Captioning Through Retrieval and Understanding Modalities | Nov 4, 2024 | AttributeDescriptive | —Unverified | 0 |
| MSVD-Indonesian: A Benchmark for Multimodal Video-Text Tasks in Indonesian | Jun 20, 2023 | Cross-Lingual TransferRetrieval | CodeCode Available | 0 |
| Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language | Apr 1, 2022 | DiversityImage Captioning | CodeCode Available | 0 |