SOTAVerified

Unsupervised Semantic Segmentation with Language-image Pre-training

A segmentation task which does not utilise any human-level supervision for semantic segmentation except for a backbone which is initialised with features pre-trained with image-level labels.

Papers

Showing 110 of 14 papers

TitleStatusHype
TextRegion: Text-Aligned Region Tokens from Frozen Image-Text ModelsCode2
CorrCLIP: Reconstructing Correlations in CLIP with Off-the-Shelf Foundation Models for Open-Vocabulary Semantic SegmentationCode2
Harnessing Vision Foundation Models for High-Performance, Training-Free Open Vocabulary SegmentationCode2
ProxyCLIP: Proxy Attention Improves CLIP for Open-Vocabulary SegmentationCode2
GroupViT: Semantic Segmentation Emerges from Text SupervisionCode2
COSMOS: Cross-Modality Self-Distillation for Vision Language Pre-trainingCode1
ITACLIP: Boosting Training-Free Semantic Segmentation with Image, Text, and Architectural EnhancementsCode1
TTD: Text-Tag Self-Distillation Enhancing Image-Text Alignment in CLIP to Alleviate Single Tag BiasCode1
TagAlign: Improving Vision-Language Alignment with Multi-Tag ClassificationCode1
TagCLIP: A Local-to-Global Framework to Enhance Open-Vocabulary Multi-Label Classification of CLIP Without TrainingCode1
Show:102550
← PrevPage 1 of 2Next →

No leaderboard results yet.