| It's Not a Modality Gap: Characterizing and Addressing the Contrastive Gap | May 28, 2024 | image-classificationImage Classification | —Unverified | 0 |
| MM-Mixing: Multi-Modal Mixing Alignment for 3D Understanding | May 28, 2024 | 3D Classification3D Object Recognition | —Unverified | 0 |
| Listenable Maps for Zero-Shot Audio Classifiers | May 27, 2024 | Decoderzero-shot-classification | —Unverified | 0 |
| CapS-Adapter: Caption-based MultiModal Adapter in Zero-Shot Classification | May 26, 2024 | zero-shot-classificationZero-Shot Learning | CodeCode Available | 0 |
| What Do You See? Enhancing Zero-Shot Image Classification with Multimodal Large Language Models | May 24, 2024 | Classificationimage-classification | CodeCode Available | 0 |
| BDetCLIP: Multimodal Prompting Contrastive Test-Time Backdoor Detection | May 24, 2024 | Contrastive LearningLanguage Modelling | —Unverified | 0 |
| Tuning-free Universally-Supervised Semantic Segmentation | May 23, 2024 | SegmentationSemantic Segmentation | —Unverified | 0 |
| Harmony: A Joint Self-Supervised and Weakly-Supervised Framework for Learning General Purpose Visual Representations | May 23, 2024 | Contrastive LearningInstance Segmentation | CodeCode Available | 0 |
| Stylometric Watermarks for Large Language Models | May 14, 2024 | Sentencezero-shot-classification | —Unverified | 0 |
| CLIP with Quality Captions: A Strong Pretraining for Vision Tasks | May 14, 2024 | Depth Estimationobject-detection | —Unverified | 0 |
| Differentiable Model Scaling using Differentiable Topk | May 12, 2024 | GPUimage-classification | CodeCode Available | 1 |
| Advanced Natural-based interaction for the ITAlian language: LLaMAntino-3-ANITA | May 11, 2024 | Computational EfficiencyLanguage Modelling | —Unverified | 0 |
| Pseudo-Prompt Generating in Pre-trained Vision-Language Models for Multi-Label Medical Image Classification | May 10, 2024 | Decoderimage-classification | CodeCode Available | 1 |
| The Effect of Model Size on LLM Post-hoc Explainability via LIME | May 8, 2024 | Natural Language Inferencezero-shot-classification | CodeCode Available | 0 |
| Adapting Dual-encoder Vision-language Models for Paraphrased Retrieval | May 6, 2024 | Image RetrievalLanguage Modeling | —Unverified | 0 |
| Source-Free Domain Adaptation Guided by Vision and Vision-Language Pre-Training | May 5, 2024 | Domain AdaptationLanguage Modelling | CodeCode Available | 0 |
| CLIPArTT: Adaptation of CLIP to New Domains at Test Time | May 1, 2024 | Pseudo LabelTest-time Adaptation | CodeCode Available | 1 |
| Modeling Caption Diversity in Contrastive Vision-Language Pretraining | Apr 30, 2024 | Diversityzero-shot-classification | CodeCode Available | 1 |
| CLIP-Mamba: CLIP Pretrained Mamba Models with OOD and Hessian Evaluation | Apr 30, 2024 | MambaState Space Models | CodeCode Available | 2 |
| ESP-Zero: Unsupervised enhancement of zero-shot classification for Extremely Sparse Point cloud | Apr 30, 2024 | zero-shot-classificationZero-Shot Learning | —Unverified | 0 |
| Prevalent Frequency of Emotional and Physical Symptoms in Social Anxiety using Zero Shot Classification: An Observational Study | Apr 26, 2024 | zero-shot-classificationZero-Shot Learning | —Unverified | 0 |
| Zero-Shot Distillation for Image Encoders: How to Make Effective Use of Synthetic Data | Apr 25, 2024 | zero-shot-classificationZero-Shot Learning | —Unverified | 0 |
| OpenDlign: Open-World Point Cloud Understanding with Depth-Aligned Images | Apr 25, 2024 | Representation LearningTransfer Learning | CodeCode Available | 1 |
| Embracing Diversity: Interpretable Zero-shot classification beyond one vector per class | Apr 25, 2024 | Diversityzero-shot-classification | —Unverified | 0 |
| Small Language Models are Good Too: An Empirical Study of Zero-Shot Classification | Apr 17, 2024 | Classificationtext-classification | —Unverified | 0 |
| Two-Stage Stance Labeling: User-Hashtag Heuristics with Graph Neural Networks | Apr 16, 2024 | Graph Neural Networkzero-shot-classification | —Unverified | 0 |
| Knowledge-enhanced Visual-Language Pretraining for Computational Pathology | Apr 15, 2024 | Cross-Modal RetrievalLanguage Modeling | CodeCode Available | 1 |
| Evolving Interpretable Visual Classifiers with Large Language Models | Apr 15, 2024 | In-Context LearningLanguage Modeling | —Unverified | 0 |
| Connecting NeRFs, Images, and Text | Apr 11, 2024 | NeRFRepresentation Learning | CodeCode Available | 0 |
| Label Propagation for Zero-shot Classification with Vision-Language Models | Apr 5, 2024 | ClassificationImage Classification | CodeCode Available | 1 |
| Forget NLI, Use a Dictionary: Zero-Shot Topic Classification for Low-Resource Languages with Application to Luxembourgish | Apr 5, 2024 | Language ModellingNatural Language Inference | CodeCode Available | 0 |
| Training-Free Semantic Segmentation via LLM-Supervision | Mar 31, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| VLM-CPL: Consensus Pseudo Labels from Vision-Language Models for Human Annotation-Free Pathological Image Classification | Mar 23, 2024 | image-classificationImage Classification | CodeCode Available | 1 |
| Long-CLIP: Unlocking the Long-Text Capability of CLIP | Mar 22, 2024 | Image GenerationImage Retrieval | CodeCode Available | 4 |
| CLIP-VIS: Adapting CLIP for Open-Vocabulary Video Instance Segmentation | Mar 19, 2024 | DecoderInstance Segmentation | CodeCode Available | 1 |
| MEDBind: Unifying Language and Multimodal Medical Data Embeddings | Mar 19, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Audio-Visual Compound Expression Recognition Method based on Late Modality Fusion and Rule-based Decision | Mar 19, 2024 | Cross-corpusEmotion Recognition | —Unverified | 0 |
| Entity6K: A Large Open-Domain Evaluation Dataset for Real-World Entity Recognition | Mar 19, 2024 | Dense CaptioningImage Captioning | —Unverified | 0 |
| Select and Distill: Selective Dual-Teacher Knowledge Transfer for Continual Learning on Vision-Language Models | Mar 14, 2024 | Continual LearningKnowledge Distillation | —Unverified | 0 |
| MoralBERT: A Fine-Tuned Language Model for Capturing Moral Values in Social Discussions | Mar 12, 2024 | Domain AdaptationLanguage Modeling | CodeCode Available | 1 |
| Zero-Shot ECG Classification with Multimodal Learning and Test-time Clinical Knowledge Enhancement | Mar 11, 2024 | Clinical KnowledgeDescriptive | CodeCode Available | 2 |
| Modeling Collaborator: Enabling Subjective Vision Classification With Minimal Human Effort via LLM Tool-Use | Mar 5, 2024 | image-classificationImage Classification | —Unverified | 0 |
| Leveraging Weakly Annotated Data for Hate Speech Detection in Code-Mixed Hinglish: A Feasibility-Driven Transfer Learning Approach with Large Language Models | Mar 4, 2024 | Few-Shot LearningHate Speech Detection | —Unverified | 0 |
| On the use of Silver Standard Data for Zero-shot Classification Tasks in Information Extraction | Feb 28, 2024 | ClassificationNatural Language Inference | CodeCode Available | 0 |
| TAMM: TriAdapter Multi-Modal Learning for 3D Shape Understanding | Feb 28, 2024 | 3D Shape RepresentationRepresentation Learning | —Unverified | 0 |
| CARZero: Cross-Attention Alignment for Radiology Zero-Shot Classification | Feb 27, 2024 | ClassificationDiagnostic | CodeCode Available | 2 |
| Robust CLIP: Unsupervised Adversarial Fine-Tuning of Vision Embeddings for Robust Large Vision-Language Models | Feb 19, 2024 | Adversarial DefenseMultimodal Deep Learning | CodeCode Available | 2 |
| Intriguing Differences Between Zero-Shot and Systematic Evaluations of Vision-Language Transformer Models | Feb 13, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Direct side information learning for zero-shot regression | Feb 2, 2024 | global-optimizationimage-classification | CodeCode Available | 0 |
| Zero-shot Classification using Hyperdimensional Computing | Jan 30, 2024 | AttributeAttribute Extraction | —Unverified | 0 |