SOTAVerified

zero-shot-classification

Papers

Showing 150 of 422 papers

TitleStatusHype
FG-CLIP: Fine-Grained Visual and Textual AlignmentCode4
A Survey of State of the Art Large Vision Language Models: Alignment, Benchmark, Evaluations and ChallengesCode4
Multimodal Whole Slide Foundation Model for PathologyCode4
Multi-label Cluster Discrimination for Visual Representation LearningCode4
Long-CLIP: Unlocking the Long-Text Capability of CLIPCode4
LLM-Pruner: On the Structural Pruning of Large Language ModelsCode3
GeoVision Labeler: Zero-Shot Geospatial Classification with Vision and Language ModelsCode2
Patho-R1: A Multimodal Reinforcement Learning-Based Pathology Expert ReasonerCode2
DiffCLIP: Differential Attention Meets CLIPCode2
Large-scale and Fine-grained Vision-language Pre-training for Enhanced CT Image UnderstandingCode2
BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific LiteratureCode2
CorrCLIP: Reconstructing Correlations in CLIP with Off-the-Shelf Foundation Models for Open-Vocabulary Semantic SegmentationCode2
Boosting Vision-Language Models for Histopathology Classification: Predict all at onceCode2
Enhancing Remote Sensing Vision-Language Models for Zero-Shot Scene ClassificationCode2
Mitigate the Gap: Investigating Approaches for Improving Cross-Modal Alignment in CLIPCode2
RWKV-CLIP: A Robust Vision-Language Representation LearnerCode2
CLIP-Mamba: CLIP Pretrained Mamba Models with OOD and Hessian EvaluationCode2
Zero-Shot ECG Classification with Multimodal Learning and Test-time Clinical Knowledge EnhancementCode2
CARZero: Cross-Attention Alignment for Radiology Zero-Shot ClassificationCode2
Robust CLIP: Unsupervised Adversarial Fine-Tuning of Vision Embeddings for Robust Large Vision-Language ModelsCode2
VeCLIP: Improving CLIP Training via Visual-enriched CaptionsCode2
Uni3D: Exploring Unified 3D Representation at ScaleCode2
RS5M and GeoRSCLIP: A Large Scale Vision-Language Dataset and A Large Vision-Language Model for Remote SensingCode2
RemoteCLIP: A Vision Language Foundation Model for Remote SensingCode2
Harnessing Explanations: LLM-to-LM Interpreter for Enhanced Text-Attributed Graph Representation LearningCode2
ULIP-2: Towards Scalable Multimodal Pre-training for 3D UnderstandingCode2
Your Diffusion Model is Secretly a Zero-Shot ClassifierCode2
ZegCLIP: Towards Adapting CLIP for Zero-shot Semantic SegmentationCode2
TabLLM: Few-shot Classification of Tabular Data with Large Language ModelsCode2
Efficient Medical Vision-Language Alignment Through Adapting Masked Vision ModelsCode1
From Local Details to Global Context: Advancing Vision-Language Models with Attention-Based SelectionCode1
MM-Skin: Enhancing Dermatology Vision-Language Model with an Image-Text Dataset Derived from TextbooksCode1
Beyond the Next Token: Towards Prompt-Robust Zero-Shot Classification via Efficient Multi-Token PredictionCode1
Advancing Medical Representation Learning Through High-Quality DataCode1
Controlling Latent Diffusion Using Latent CLIPCode1
CLIP Under the Microscope: A Fine-Grained Analysis of Multi-Object RepresentationCode1
CLIPure: Purification in Latent Space via CLIP for Adversarially Robust Zero-Shot ClassificationCode1
LR0.FM: Low-Res Benchmark and Improving Robustness for Zero-Shot Classification in Foundation ModelsCode1
SenCLIP: Enhancing zero-shot land-use mapping for Sentinel-2 with ground-level promptingCode1
CLIP meets DINO for Tuning Zero-Shot Classifier using Unlabeled Image CollectionsCode1
TableTime: Reformulating Time Series Classification as Zero-Shot Table Understanding via Large Language ModelsCode1
CLIPer: Hierarchically Improving Spatial Representation of CLIP for Open-Vocabulary Semantic SegmentationCode1
RaVL: Discovering and Mitigating Spurious Correlations in Fine-Tuned Vision-Language ModelsCode1
Interpreting and Analysing CLIP's Zero-Shot Image Classification via Mutual KnowledgeCode1
AgriCLIP: Adapting CLIP for Agriculture and Livestock via Domain-Specialized Cross-Model AlignmentCode1
DC3DO: Diffusion Classifier for 3D ObjectsCode1
Adversarial Robustification via Text-to-Image Diffusion ModelsCode1
Exploring the Spectrum of Visio-Linguistic Compositionality and RecognitionCode1
CountCLIP -- [Re] Teaching CLIP to Count to TenCode1
Differentiable Model Scaling using Differentiable TopkCode1
Show:102550
← PrevPage 1 of 9Next →

No leaderboard results yet.