SOTAVerified

zero-shot-classification

Papers

Showing 51100 of 422 papers

TitleStatusHype
Captured by Captions: On Memorization and its Mitigation in CLIP Models0
DCFormer: Efficient 3D Vision-Language Modeling with Decomposed Convolutions0
LR0.FM: Low-Res Benchmark and Improving Robustness for Zero-Shot Classification in Foundation ModelsCode1
Large-scale and Fine-grained Vision-language Pre-training for Enhanced CT Image UnderstandingCode2
Revisiting CLIP: Efficient Alignment of 3D MRI and Tabular Data using Domain-Specific Foundation ModelsCode0
KPL: Training-Free Medical Knowledge Mining of Vision-Language ModelsCode0
FLAVARS: A Multimodal Foundational Language and Vision Alignment Model for Remote Sensing0
BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific LiteratureCode2
A Statistical Theory of Contrastive Pre-training and Multimodal Generative AICode0
A Survey of State of the Art Large Vision Language Models: Alignment, Benchmark, Evaluations and ChallengesCode4
LLMs & Legal Aid: Understanding Legal Needs Exhibited Through User Queries0
Generalized Zero-Shot Classification via Semantics-Free Inter-Class Feature Generation0
Cross-Modal 3D Representation with Multi-View Images and Point Clouds0
Multiple Consistency-guided Test-Time Adaptation for Contrastive Audio-Language Models with Unlabeled Audio0
DINOv2 Meets Text: A Unified Framework for Image- and Pixel-Level Vision-Language AlignmentCode0
Adaptive Pruning for Large Language Models with Structural Importance Awareness0
Zero-Shot Image Moderation in Google Ads with LLM-Assisted Textual Descriptions and Cross-modal Co-embeddings0
CRoF: CLIP-based Robust Few-shot Learning on Noisy Labels0
A Simple and Efficient Baseline for Zero-Shot Generative Classification0
An Efficient Framework for Enhancing Discriminative Models via Diffusion TechniquesCode0
SenCLIP: Enhancing zero-shot land-use mapping for Sentinel-2 with ground-level promptingCode1
Can Graph Neural Networks Learn Language with Extremely Weak Text Supervision?Code0
Explaining and Mitigating the Modality Gap in Contrastive Multimodal Learning0
S^3: Synonymous Semantic Space for Improving Zero-Shot Generalization of Vision-Language Models0
Automated Medical Report Generation for ECG Data: Bridging Medical Text and Signal Processing with Deep LearningCode0
Multimodal Remote Sensing Scene Classification Using VLMs and Dual-Cross Attention NetworksCode0
Perturb and Recover: Fine-tuning for Effective Backdoor Removal from CLIPCode0
Multimodal Whole Slide Foundation Model for PathologyCode4
CLIP meets DINO for Tuning Zero-Shot Classifier using Unlabeled Image CollectionsCode1
Active Data Curation Effectively Distills Large-Scale Multimodal Models0
TableTime: Reformulating Time Series Classification as Zero-Shot Table Understanding via Large Language ModelsCode1
CLIPer: Hierarchically Improving Spatial Representation of CLIP for Open-Vocabulary Semantic SegmentationCode1
CorrCLIP: Reconstructing Correlations in CLIP with Off-the-Shelf Foundation Models for Open-Vocabulary Semantic SegmentationCode2
Measuring similarity between embedding spaces using induced neighborhood graphs0
NatureLM-audio: an Audio-Language Foundation Model for Bioacoustics0
Enhancing Visual Classification using Comparative DescriptorsCode0
Asterisk*: Keep it Simple0
RaVL: Discovering and Mitigating Spurious Correlations in Fine-Tuned Vision-Language ModelsCode1
ResiDual Transformer Alignment with Spectral Decomposition0
Active Learning for Vision-Language Models0
Fine-tuned Large Language Models (LLMs): Improved Prompt Injection Attacks Detection0
Label Set Optimization via Activation Distribution Kurtosis for Zero-shot Classification with Generative Models0
MoRE: Multi-Modal Contrastive Pre-training with Transformers on X-Rays, ECGs, and Diagnostic ReportCode0
Assessing Open-world Forgetting in Generative Image Model Customization0
Can Medical Vision-Language Pre-training Succeed with Purely Synthetic Data?0
LLM Chain Ensembles for Scalable and Accurate Data AnnotationCode0
Interpreting and Analysing CLIP's Zero-Shot Image Classification via Mutual KnowledgeCode1
CtrlSynth: Controllable Image Text Synthesis for Data-Efficient Multimodal Learning0
A Unified Debiasing Approach for Vision-Language Models across Modalities and TasksCode0
GLOV: Guided Large Language Models as Implicit Optimizers for Vision Language ModelsCode0
Show:102550
← PrevPage 2 of 9Next →

No leaderboard results yet.