SOTAVerified

Visual Prompt Tuning

Visual Prompt Tuning(VPT) only introduces a small amount of task-specific learnable parameters into the input space while freezing the entire pre-trained Transformer backbone during downstream training. In practice, these additional parameters are simply prepended into the input sequence of each Transformer layer and learned together with a linear head during fine-tuning. VPT is especially effective in the low-data regime, and maintains its advantage across data scales. Finally, VPT is competitive for a range of Transformer scales and designs (ViTBase/Large/Huge, Swin). Put together, the results suggest that VPT is one of the most effective ways of adapting ever-growing vision backbones.

Papers

Showing 110 of 70 papers

TitleStatusHype
Prompt Disentanglement via Language Guidance and Representation Alignment for Domain Generalization0
Attention to Burstiness: Low-Rank Bilinear Prompt TuningCode0
DA-VPT: Semantic-Guided Visual Prompt Tuning for Vision TransformersCode1
Q-Adapt: Adapting LMM for Visual Quality Assessment with Progressive Instruction TuningCode1
Visual Variational Autoencoder Prompt Tuning0
Iterative Prompt Relocation for Distribution-Adaptive Visual Prompt TuningCode0
Exploring Interpretability for Visual Prompt Tuning with Hierarchical Concepts0
Adaptive Prompt: Unlocking the Power of Visual Prompt Tuning0
Prompt-CAM: A Simpler Interpretable Transformer for Fine-Grained AnalysisCode2
Harnessing Large Language and Vision-Language Models for Robust Out-of-Distribution Detection0
Show:102550
← PrevPage 1 of 7Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1SPT-Deep(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)Mean Accuracy76.2Unverified
2GateVPT(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)Mean Accuracy74.84Unverified
3SPT-Shallow(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)Mean Accuracy74.47Unverified
4VPT-Deep(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)Mean Accuracy70.27Unverified
5VPT-Shallow(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)Mean Accuracy67.34Unverified
6SPT-Deep(ViT-B/16_MAE_pretrained_ImageNet-1K)Mean Accuracy67.19Unverified
7SPT-Shallow(ViT-B/16_MAE_pretrained_ImageNet-1K)Mean Accuracy62.53Unverified
8GateVPT(ViT-B/16_MAE_pretrained_ImageNet-1K)Mean Accuracy47.61Unverified
9VPT-Shallow(ViT-B/16_MAE_pretrained_ImageNet-1K)Mean Accuracy39.96Unverified
10VPT-Deep(ViT-B/16_MAE_pretrained_ImageNet-1K)Mean Accuracy36.02Unverified