| Contrastive Learning and Mixture of Experts Enables Precise Vector Embeddings | Jan 28, 2024 | Contrastive LearningDescriptive | CodeCode Available | 1 |
| From Representation to Reasoning: Towards both Evidence and Commonsense Reasoning for Video Question-Answering | May 30, 2022 | counterfactualDescriptive | CodeCode Available | 1 |
| What Makes CLIP More Robust to Long-Tailed Pre-Training Data? A Controlled Study for Transferable Insights | May 31, 2024 | DescriptiveSelf-Supervised Learning | CodeCode Available | 1 |
| Generating images from caption and vice versa via CLIP-Guided Generative Latent Space Search | Feb 2, 2021 | DescriptiveImage Generation | CodeCode Available | 1 |
| GL-RG: Global-Local Representation Granularity for Video Captioning | May 22, 2022 | Caption GenerationDescriptive | CodeCode Available | 1 |
| GOAL: Global-local Object Alignment Learning | Mar 22, 2025 | DescriptiveObject | CodeCode Available | 1 |
| Contrastive Audio-Language Learning for Music | Aug 25, 2022 | Audio to Text RetrievalDescriptive | CodeCode Available | 1 |
| Contrastive Learning of Medical Visual Representations from Paired Images and Text | Oct 2, 2020 | Contrastive LearningDescriptive | CodeCode Available | 1 |
| ConTEXTual Net: A Multimodal Vision-Language Model for Segmentation of Pneumothorax | Mar 2, 2023 | DescriptiveImage Captioning | CodeCode Available | 1 |
| Contextual Non-Local Alignment over Full-Scale Representation for Text-Based Person Search | Jan 8, 2021 | DescriptiveSentence | CodeCode Available | 1 |