SOTAVerified

Visual Prompt Tuning

Visual Prompt Tuning(VPT) only introduces a small amount of task-specific learnable parameters into the input space while freezing the entire pre-trained Transformer backbone during downstream training. In practice, these additional parameters are simply prepended into the input sequence of each Transformer layer and learned together with a linear head during fine-tuning. VPT is especially effective in the low-data regime, and maintains its advantage across data scales. Finally, VPT is competitive for a range of Transformer scales and designs (ViTBase/Large/Huge, Swin). Put together, the results suggest that VPT is one of the most effective ways of adapting ever-growing vision backbones.

Papers

Showing 2130 of 70 papers

TitleStatusHype
Facing the Elephant in the Room: Visual Prompt Tuning or Full Finetuning?Code1
DA-VPT: Semantic-Guided Visual Prompt Tuning for Vision TransformersCode1
Dual Modality Prompt Tuning for Vision-Language Pre-Trained ModelCode1
Q-Adapt: Adapting LMM for Visual Quality Assessment with Progressive Instruction TuningCode1
Unlocking the Potential of Prompt-Tuning in Bridging Generalized and Personalized Federated LearningCode1
Improving Visual Prompt Tuning by Gaussian Neighborhood Minimization for Long-Tailed Visual RecognitionCode1
Improving Visual Prompt Tuning for Self-supervised Vision TransformersCode1
Instance-aware Dynamic Prompt Tuning for Pre-trained Point Cloud ModelsCode1
Unified Vision and Language Prompt LearningCode1
Iterative Prompt Relocation for Distribution-Adaptive Visual Prompt TuningCode0
Show:102550
← PrevPage 3 of 7Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1SPT-Deep(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)Mean Accuracy86Unverified
2SPT-Shallow(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)Mean Accuracy84.08Unverified
3SPT-Deep(ViT-B/16_MAE_pretrained_ImageNet-1K)Mean Accuracy83.26Unverified
4VPT-Deep(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)Mean Accuracy83.12Unverified
5GateVPT(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)Mean Accuracy83Unverified
6VPT-Shallow (ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)Mean Accuracy79.26Unverified
7SPT-Shallow(ViT-B/16_MAE_pretrained_ImageNet-1K)Mean Accuracy73.95Unverified
8GateVPT(ViT-B/16_MAE_pretrained_ImageNet-1K)Mean Accuracy73.39Unverified
9VPT-Deep (ViT-B/16_MAE_pretrained_ImageNet-1K)Mean Accuracy72.02Unverified
10VPT-Shallow (ViT-B/16_MAE_pretrained_ImageNet-1K)Mean Accuracy57.84Unverified
#ModelMetricClaimedVerifiedStatus
1SPT-Deep(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)Mean Accuracy76.2Unverified
2GateVPT(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)Mean Accuracy74.84Unverified
3SPT-Shallow(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)Mean Accuracy74.47Unverified
4VPT-Deep(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)Mean Accuracy70.27Unverified
5VPT-Shallow(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)Mean Accuracy67.34Unverified
6SPT-Deep(ViT-B/16_MAE_pretrained_ImageNet-1K)Mean Accuracy67.19Unverified
7SPT-Shallow(ViT-B/16_MAE_pretrained_ImageNet-1K)Mean Accuracy62.53Unverified
8GateVPT(ViT-B/16_MAE_pretrained_ImageNet-1K)Mean Accuracy47.61Unverified
9VPT-Shallow(ViT-B/16_MAE_pretrained_ImageNet-1K)Mean Accuracy39.96Unverified
10VPT-Deep(ViT-B/16_MAE_pretrained_ImageNet-1K)Mean Accuracy36.02Unverified
#ModelMetricClaimedVerifiedStatus
1SPT-Deep(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)Mean Accuracy84.95Unverified
2SPT-Shallow(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)Mean Accuracy83.93Unverified
3GateVPT(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)Mean Accuracy83.38Unverified
4SPT-Deep(ViT-B/16_MAE_pretrained_ImageNet-1K)Mean Accuracy83.15Unverified
5VPT-Deep(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)Mean Accuracy83.04Unverified
6VPT-Shallow(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)Mean Accuracy82.26Unverified
7SPT-Shallow(ViT-B/16_MAE_pretrained_ImageNet-1K)Mean Accuracy80.9Unverified
8GateVPT(ViT-B/16_MAE_pretrained_ImageNet-1K)Mean Accuracy76.86Unverified
9VPT-Shallow(ViT-B/16_MAE_pretrained_ImageNet-1K)Mean Accuracy69.65Unverified
10VPT-Deep(ViT-B/16_MAE_pretrained_ImageNet-1K)Mean Accuracy60.61Unverified
#ModelMetricClaimedVerifiedStatus
1SPT-Deep(ViT-B/16_MAE_pretrained_ImageNet-1K)Mean Accuracy59.23Unverified
2SPT-Deep(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)Mean Accuracy58.36Unverified
3SPT-Shallow(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)Mean Accuracy55.16Unverified
4SPT-Shallow(ViT-B/16_MAE_pretrained_ImageNet-1K)Mean Accuracy53.46Unverified
5GateVPT(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)Mean Accuracy49.1Unverified
6VPT-Deep(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)Mean Accuracy42.38Unverified
7VPT-Shallow(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)Mean Accuracy37.55Unverified
8GateVPT(ViT-B/16_MAE_pretrained_ImageNet-1K)Mean Accuracy36.8Unverified
9VPT-Shallow(ViT-B/16_MAE_pretrained_ImageNet-1K)Mean Accuracy27.5Unverified
10VPT-Deep(ViT-B/16_MAE_pretrained_ImageNet-1K)Mean Accuracy26.57Unverified