| Does the Performance of Text-to-Image Retrieval Models Generalize Beyond Captions-as-a-Query? | Mar 15, 2024 | DescriptiveImage Captioning | CodeCode Available | 0 |
| CSDNet: Detect Salient Object in Depth-Thermal via A Lightweight Cross Shallow and Deep Perception Network | Mar 15, 2024 | DescriptiveInformativeness | —Unverified | 0 |
| Fundamental Components of Deep Learning: A category-theoretic approach | Mar 13, 2024 | Deep LearningDescriptive | CodeCode Available | 5 |
| Textual Knowledge Matters: Cross-Modality Co-Teaching for Generalized Visual Class Discovery | Mar 12, 2024 | DescriptiveRetrieval | CodeCode Available | 1 |
| Zero-Shot ECG Classification with Multimodal Learning and Test-time Clinical Knowledge Enhancement | Mar 11, 2024 | Clinical KnowledgeDescriptive | CodeCode Available | 2 |
| FontCLIP: A Semantic Typography Visual-Language Model for Multilingual Font Applications | Mar 11, 2024 | AttributeDescriptive | CodeCode Available | 1 |
| Medical Image Synthesis via Fine-Grained Image-Text Alignment and Anatomy-Pathology Prompting | Mar 11, 2024 | AnatomyDescriptive | —Unverified | 0 |
| Structure Your Data: Towards Semantic Graph Counterfactuals | Mar 11, 2024 | counterfactualDescriptive | CodeCode Available | 0 |
| An Item is Worth a Prompt: Versatile Image Editing with Disentangled Control | Mar 7, 2024 | Descriptive | CodeCode Available | 2 |
| The Case for Evaluating Multimodal Translation Models on Text Datasets | Mar 5, 2024 | DescriptiveImage Captioning | —Unverified | 0 |