SOTAVerified

Visual Prompt Tuning

Visual Prompt Tuning(VPT) only introduces a small amount of task-specific learnable parameters into the input space while freezing the entire pre-trained Transformer backbone during downstream training. In practice, these additional parameters are simply prepended into the input sequence of each Transformer layer and learned together with a linear head during fine-tuning. VPT is especially effective in the low-data regime, and maintains its advantage across data scales. Finally, VPT is competitive for a range of Transformer scales and designs (ViTBase/Large/Huge, Swin). Put together, the results suggest that VPT is one of the most effective ways of adapting ever-growing vision backbones.

Papers

Showing 150 of 70 papers

TitleStatusHype
Visual Prompt TuningCode3
Prompt-CAM: Making Vision Transformers Interpretable for Fine-Grained AnalysisCode2
ECLIPSE: Efficient Continual Learning in Panoptic Segmentation with Visual Prompt TuningCode2
CoLLaVO: Crayon Large Language and Vision mOdelCode2
Prompt-CAM: A Simpler Interpretable Transformer for Fine-Grained AnalysisCode2
Multitask Vision-Language Prompt TuningCode1
Online Class Incremental Learning on Stochastic Blurry Task Boundary via Mask and Visual Prompt TuningCode1
CVPT: Cross-Attention help Visual Prompt Tuning adapt visual taskCode1
DA-VPT: Semantic-Guided Visual Prompt Tuning for Vision TransformersCode1
Facing the Elephant in the Room: Visual Prompt Tuning or Full Finetuning?Code1
Dual Modality Prompt Tuning for Vision-Language Pre-Trained ModelCode1
Visual Prompt Tuning for Generative Transfer LearningCode1
CoDA: Instructive Chain-of-Domain Adaptation with Severity-Aware Visual Prompt TuningCode1
Q-Adapt: Adapting LMM for Visual Quality Assessment with Progressive Instruction TuningCode1
Understanding Zero-Shot Adversarial Robustness for Large-Scale ModelsCode1
SA^2VP: Spatially Aligned-and-Adapted Visual PromptCode1
Uncertainty-aware State Space Transformer for Egocentric 3D Hand Trajectory ForecastingCode1
Unlocking the Potential of Prompt-Tuning in Bridging Generalized and Personalized Federated LearningCode1
TAI++: Text as Image for Multi-Label Image Classification by Co-Learning Transferable PromptCode1
Improving Visual Prompt Tuning by Gaussian Neighborhood Minimization for Long-Tailed Visual RecognitionCode1
Improving Visual Prompt Tuning for Self-supervised Vision TransformersCode1
TransTIC: Transferring Transformer-based Image Compression from Human Perception to Machine PerceptionCode1
TSP-Transformer: Task-Specific Prompts Boosted Transformer for Holistic Scene UnderstandingCode1
Visual Fourier Prompt TuningCode1
Instance-aware Dynamic Prompt Tuning for Pre-trained Point Cloud ModelsCode1
Learning Disentangled Prompts for Compositional Image SynthesisCode1
Visual Prompt Tuning in Null Space for Continual LearningCode1
E^2VPT: An Effective and Efficient Approach for Visual Prompt TuningCode1
Unified Vision and Language Prompt LearningCode1
VPA: Fully Test-Time Visual Prompt Adaptation0
Adaptive Prompt Tuning: Vision Guided Prompt Tuning with Cross-Attention for Fine-Grained Few-Shot Learning0
Adaptive Prompt: Unlocking the Power of Visual Prompt Tuning0
AdMiT: Adaptive Multi-Source Tuning in Dynamic Environments0
Correlative and Discriminative Label Grouping for Multi-Label Visual Prompt Tuning0
Defending Multimodal Backdoored Models by Repulsive Visual Prompt Tuning0
Disentangled Prompt Representation for Domain Generalization0
Do We Really Need a Large Number of Visual Prompts?0
Dynamic Visual Prompt Tuning for Parameter Efficient Transfer Learning0
End-to-end Multi-source Visual Prompt Tuning for Survival Analysis in Whole Slide Images0
Exploring Interpretability for Visual Prompt Tuning with Hierarchical Concepts0
Fair-VPT: Fair Visual Prompt Tuning for Image Classification0
Gradient-Regulated Meta-Prompt Learning for Generalizable Vision-Language Models0
Harnessing Large Language and Vision-Language Models for Robust Out-of-Distribution Detection0
iVPT: Improving Task-relevant Information Sharing in Visual Prompt Tuning by Cross-layer Dynamic Connection0
LoGoPrompt: Synthetic Text Images Can Be Good Visual Prompts for Vision-Language Models0
LSPT: Long-term Spatial Prompt Tuning for Visual Representation Learning0
MedFocusCLIP : Improving few shot classification in medical datasets using pixel wise attention0
MVP: Meta Visual Prompt Tuning for Few-Shot Remote Sensing Image Scene Classification0
Open Vocabulary Semantic Scene Sketch Understanding0
Probing the Efficacy of Federated Parameter-Efficient Fine-Tuning of Vision Transformers for Medical Image Classification0
Show:102550
← PrevPage 1 of 2Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1SPT-Deep(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)Mean Accuracy86Unverified
2SPT-Shallow(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)Mean Accuracy84.08Unverified
3SPT-Deep(ViT-B/16_MAE_pretrained_ImageNet-1K)Mean Accuracy83.26Unverified
4VPT-Deep(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)Mean Accuracy83.12Unverified
5GateVPT(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)Mean Accuracy83Unverified
6VPT-Shallow (ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)Mean Accuracy79.26Unverified
7SPT-Shallow(ViT-B/16_MAE_pretrained_ImageNet-1K)Mean Accuracy73.95Unverified
8GateVPT(ViT-B/16_MAE_pretrained_ImageNet-1K)Mean Accuracy73.39Unverified
9VPT-Deep (ViT-B/16_MAE_pretrained_ImageNet-1K)Mean Accuracy72.02Unverified
10VPT-Shallow (ViT-B/16_MAE_pretrained_ImageNet-1K)Mean Accuracy57.84Unverified
#ModelMetricClaimedVerifiedStatus
1SPT-Deep(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)Mean Accuracy76.2Unverified
2GateVPT(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)Mean Accuracy74.84Unverified
3SPT-Shallow(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)Mean Accuracy74.47Unverified
4VPT-Deep(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)Mean Accuracy70.27Unverified
5VPT-Shallow(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)Mean Accuracy67.34Unverified
6SPT-Deep(ViT-B/16_MAE_pretrained_ImageNet-1K)Mean Accuracy67.19Unverified
7SPT-Shallow(ViT-B/16_MAE_pretrained_ImageNet-1K)Mean Accuracy62.53Unverified
8GateVPT(ViT-B/16_MAE_pretrained_ImageNet-1K)Mean Accuracy47.61Unverified
9VPT-Shallow(ViT-B/16_MAE_pretrained_ImageNet-1K)Mean Accuracy39.96Unverified
10VPT-Deep(ViT-B/16_MAE_pretrained_ImageNet-1K)Mean Accuracy36.02Unverified
#ModelMetricClaimedVerifiedStatus
1SPT-Deep(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)Mean Accuracy84.95Unverified
2SPT-Shallow(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)Mean Accuracy83.93Unverified
3GateVPT(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)Mean Accuracy83.38Unverified
4SPT-Deep(ViT-B/16_MAE_pretrained_ImageNet-1K)Mean Accuracy83.15Unverified
5VPT-Deep(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)Mean Accuracy83.04Unverified
6VPT-Shallow(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)Mean Accuracy82.26Unverified
7SPT-Shallow(ViT-B/16_MAE_pretrained_ImageNet-1K)Mean Accuracy80.9Unverified
8GateVPT(ViT-B/16_MAE_pretrained_ImageNet-1K)Mean Accuracy76.86Unverified
9VPT-Shallow(ViT-B/16_MAE_pretrained_ImageNet-1K)Mean Accuracy69.65Unverified
10VPT-Deep(ViT-B/16_MAE_pretrained_ImageNet-1K)Mean Accuracy60.61Unverified
#ModelMetricClaimedVerifiedStatus
1SPT-Deep(ViT-B/16_MAE_pretrained_ImageNet-1K)Mean Accuracy59.23Unverified
2SPT-Deep(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)Mean Accuracy58.36Unverified
3SPT-Shallow(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)Mean Accuracy55.16Unverified
4SPT-Shallow(ViT-B/16_MAE_pretrained_ImageNet-1K)Mean Accuracy53.46Unverified
5GateVPT(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)Mean Accuracy49.1Unverified
6VPT-Deep(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)Mean Accuracy42.38Unverified
7VPT-Shallow(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)Mean Accuracy37.55Unverified
8GateVPT(ViT-B/16_MAE_pretrained_ImageNet-1K)Mean Accuracy36.8Unverified
9VPT-Shallow(ViT-B/16_MAE_pretrained_ImageNet-1K)Mean Accuracy27.5Unverified
10VPT-Deep(ViT-B/16_MAE_pretrained_ImageNet-1K)Mean Accuracy26.57Unverified