SOTAVerified

Visual Prompt Tuning

Visual Prompt Tuning(VPT) only introduces a small amount of task-specific learnable parameters into the input space while freezing the entire pre-trained Transformer backbone during downstream training. In practice, these additional parameters are simply prepended into the input sequence of each Transformer layer and learned together with a linear head during fine-tuning. VPT is especially effective in the low-data regime, and maintains its advantage across data scales. Finally, VPT is competitive for a range of Transformer scales and designs (ViTBase/Large/Huge, Swin). Put together, the results suggest that VPT is one of the most effective ways of adapting ever-growing vision backbones.

Papers

Showing 110 of 70 papers

TitleStatusHype
Visual Prompt TuningCode3
Prompt-CAM: A Simpler Interpretable Transformer for Fine-Grained AnalysisCode2
CoLLaVO: Crayon Large Language and Vision mOdelCode2
ECLIPSE: Efficient Continual Learning in Panoptic Segmentation with Visual Prompt TuningCode2
Prompt-CAM: Making Vision Transformers Interpretable for Fine-Grained AnalysisCode2
CoDA: Instructive Chain-of-Domain Adaptation with Severity-Aware Visual Prompt TuningCode1
Dual Modality Prompt Tuning for Vision-Language Pre-Trained ModelCode1
Unlocking the Potential of Prompt-Tuning in Bridging Generalized and Personalized Federated LearningCode1
CVPT: Cross-Attention help Visual Prompt Tuning adapt visual taskCode1
DA-VPT: Semantic-Guided Visual Prompt Tuning for Vision TransformersCode1
Show:102550
← PrevPage 1 of 7Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1SPT-Deep(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)Mean Accuracy86Unverified
2SPT-Shallow(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)Mean Accuracy84.08Unverified
3SPT-Deep(ViT-B/16_MAE_pretrained_ImageNet-1K)Mean Accuracy83.26Unverified
4VPT-Deep(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)Mean Accuracy83.12Unverified
5GateVPT(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)Mean Accuracy83Unverified
6VPT-Shallow (ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)Mean Accuracy79.26Unverified
7SPT-Shallow(ViT-B/16_MAE_pretrained_ImageNet-1K)Mean Accuracy73.95Unverified
8GateVPT(ViT-B/16_MAE_pretrained_ImageNet-1K)Mean Accuracy73.39Unverified
9VPT-Deep (ViT-B/16_MAE_pretrained_ImageNet-1K)Mean Accuracy72.02Unverified
10VPT-Shallow (ViT-B/16_MAE_pretrained_ImageNet-1K)Mean Accuracy57.84Unverified
#ModelMetricClaimedVerifiedStatus
1SPT-Deep(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)Mean Accuracy76.2Unverified
2GateVPT(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)Mean Accuracy74.84Unverified
3SPT-Shallow(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)Mean Accuracy74.47Unverified
4VPT-Deep(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)Mean Accuracy70.27Unverified
5VPT-Shallow(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)Mean Accuracy67.34Unverified
6SPT-Deep(ViT-B/16_MAE_pretrained_ImageNet-1K)Mean Accuracy67.19Unverified
7SPT-Shallow(ViT-B/16_MAE_pretrained_ImageNet-1K)Mean Accuracy62.53Unverified
8GateVPT(ViT-B/16_MAE_pretrained_ImageNet-1K)Mean Accuracy47.61Unverified
9VPT-Shallow(ViT-B/16_MAE_pretrained_ImageNet-1K)Mean Accuracy39.96Unverified
10VPT-Deep(ViT-B/16_MAE_pretrained_ImageNet-1K)Mean Accuracy36.02Unverified
#ModelMetricClaimedVerifiedStatus
1SPT-Deep(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)Mean Accuracy84.95Unverified
2SPT-Shallow(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)Mean Accuracy83.93Unverified
3GateVPT(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)Mean Accuracy83.38Unverified
4SPT-Deep(ViT-B/16_MAE_pretrained_ImageNet-1K)Mean Accuracy83.15Unverified
5VPT-Deep(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)Mean Accuracy83.04Unverified
6VPT-Shallow(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)Mean Accuracy82.26Unverified
7SPT-Shallow(ViT-B/16_MAE_pretrained_ImageNet-1K)Mean Accuracy80.9Unverified
8GateVPT(ViT-B/16_MAE_pretrained_ImageNet-1K)Mean Accuracy76.86Unverified
9VPT-Shallow(ViT-B/16_MAE_pretrained_ImageNet-1K)Mean Accuracy69.65Unverified
10VPT-Deep(ViT-B/16_MAE_pretrained_ImageNet-1K)Mean Accuracy60.61Unverified
#ModelMetricClaimedVerifiedStatus
1SPT-Deep(ViT-B/16_MAE_pretrained_ImageNet-1K)Mean Accuracy59.23Unverified
2SPT-Deep(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)Mean Accuracy58.36Unverified
3SPT-Shallow(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)Mean Accuracy55.16Unverified
4SPT-Shallow(ViT-B/16_MAE_pretrained_ImageNet-1K)Mean Accuracy53.46Unverified
5GateVPT(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)Mean Accuracy49.1Unverified
6VPT-Deep(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)Mean Accuracy42.38Unverified
7VPT-Shallow(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)Mean Accuracy37.55Unverified
8GateVPT(ViT-B/16_MAE_pretrained_ImageNet-1K)Mean Accuracy36.8Unverified
9VPT-Shallow(ViT-B/16_MAE_pretrained_ImageNet-1K)Mean Accuracy27.5Unverified
10VPT-Deep(ViT-B/16_MAE_pretrained_ImageNet-1K)Mean Accuracy26.57Unverified