| Contextual Non-Local Alignment over Full-Scale Representation for Text-Based Person Search | Jan 8, 2021 | DescriptiveSentence | CodeCode Available | 1 | 5 |
| ConTEXTual Net: A Multimodal Vision-Language Model for Segmentation of Pneumothorax | Mar 2, 2023 | DescriptiveImage Captioning | CodeCode Available | 1 | 5 |
| Contrastive Learning and Mixture of Experts Enables Precise Vector Embeddings | Jan 28, 2024 | Contrastive LearningDescriptive | CodeCode Available | 1 | 5 |
| LaMOT: Language-Guided Multi-Object Tracking | Jun 12, 2024 | DescriptiveMulti-Object Tracking | CodeCode Available | 1 | 5 |
| GL-RG: Global-Local Representation Granularity for Video Captioning | May 22, 2022 | Caption GenerationDescriptive | CodeCode Available | 1 | 5 |
| Controlling Latent Diffusion Using Latent CLIP | Mar 11, 2025 | DenoisingDescriptive | CodeCode Available | 1 | 5 |
| Beyond Co-occurrence: Multi-modal Session-based Recommendation | Sep 29, 2023 | Contrastive LearningDescriptive | CodeCode Available | 1 | 5 |
| CRAFT: A Benchmark for Causal Reasoning About Forces and inTeractions | Dec 8, 2020 | counterfactualDescriptive | CodeCode Available | 1 | 5 |
| Bias Loss for Mobile Neural Networks | Jul 23, 2021 | DescriptiveDiversity | CodeCode Available | 1 | 5 |
| From Representation to Reasoning: Towards both Evidence and Commonsense Reasoning for Video Question-Answering | May 30, 2022 | counterfactualDescriptive | CodeCode Available | 1 | 5 |