SOTAVerified

Visual Prompt Tuning

Visual Prompt Tuning(VPT) only introduces a small amount of task-specific learnable parameters into the input space while freezing the entire pre-trained Transformer backbone during downstream training. In practice, these additional parameters are simply prepended into the input sequence of each Transformer layer and learned together with a linear head during fine-tuning. VPT is especially effective in the low-data regime, and maintains its advantage across data scales. Finally, VPT is competitive for a range of Transformer scales and designs (ViTBase/Large/Huge, Swin). Put together, the results suggest that VPT is one of the most effective ways of adapting ever-growing vision backbones.

Papers

Showing 150 of 70 papers

TitleStatusHype
Visual Prompt TuningCode3
Prompt-CAM: A Simpler Interpretable Transformer for Fine-Grained AnalysisCode2
CoLLaVO: Crayon Large Language and Vision mOdelCode2
ECLIPSE: Efficient Continual Learning in Panoptic Segmentation with Visual Prompt TuningCode2
Prompt-CAM: Making Vision Transformers Interpretable for Fine-Grained AnalysisCode2
CoDA: Instructive Chain-of-Domain Adaptation with Severity-Aware Visual Prompt TuningCode1
Q-Adapt: Adapting LMM for Visual Quality Assessment with Progressive Instruction TuningCode1
SA^2VP: Spatially Aligned-and-Adapted Visual PromptCode1
Unlocking the Potential of Prompt-Tuning in Bridging Generalized and Personalized Federated LearningCode1
TAI++: Text as Image for Multi-Label Image Classification by Co-Learning Transferable PromptCode1
Improving Visual Prompt Tuning by Gaussian Neighborhood Minimization for Long-Tailed Visual RecognitionCode1
Improving Visual Prompt Tuning for Self-supervised Vision TransformersCode1
TransTIC: Transferring Transformer-based Image Compression from Human Perception to Machine PerceptionCode1
TSP-Transformer: Task-Specific Prompts Boosted Transformer for Holistic Scene UnderstandingCode1
Online Class Incremental Learning on Stochastic Blurry Task Boundary via Mask and Visual Prompt TuningCode1
CVPT: Cross-Attention help Visual Prompt Tuning adapt visual taskCode1
DA-VPT: Semantic-Guided Visual Prompt Tuning for Vision TransformersCode1
Facing the Elephant in the Room: Visual Prompt Tuning or Full Finetuning?Code1
Dual Modality Prompt Tuning for Vision-Language Pre-Trained ModelCode1
Understanding Zero-Shot Adversarial Robustness for Large-Scale ModelsCode1
Unified Vision and Language Prompt LearningCode1
Instance-aware Dynamic Prompt Tuning for Pre-trained Point Cloud ModelsCode1
Visual Fourier Prompt TuningCode1
Visual Prompt Tuning for Generative Transfer LearningCode1
Visual Prompt Tuning in Null Space for Continual LearningCode1
Learning Disentangled Prompts for Compositional Image SynthesisCode1
E^2VPT: An Effective and Efficient Approach for Visual Prompt TuningCode1
Multitask Vision-Language Prompt TuningCode1
Uncertainty-aware State Space Transformer for Egocentric 3D Hand Trajectory ForecastingCode1
TuneVLSeg: Prompt Tuning Benchmark for Vision-Language Segmentation ModelsCode0
Revisiting the Power of Prompt for Visual TuningCode0
SAN: Hypothesizing Long-Term Synaptic Development and Neural Engram Mechanism in Scalable Model's Parameter-Efficient Fine-TuningCode0
Med-PerSAM: One-Shot Visual Prompt Tuning for Personalized Segment Anything Model in Medical DomainCode0
From Question to Exploration: Test-Time Adaptation in Semantic Segmentation?Code0
DVPT: Dynamic Visual Prompt Tuning of Large Pre-trained Models for Medical Image AnalysisCode0
Semantic Hierarchical Prompt Tuning for Parameter-Efficient Fine-TuningCode0
Iterative Prompt Relocation for Distribution-Adaptive Visual Prompt TuningCode0
Attention to Burstiness: Low-Rank Bilinear Prompt TuningCode0
VPA: Fully Test-Time Visual Prompt Adaptation0
Adaptive Prompt Tuning: Vision Guided Prompt Tuning with Cross-Attention for Fine-Grained Few-Shot Learning0
Adaptive Prompt: Unlocking the Power of Visual Prompt Tuning0
AdMiT: Adaptive Multi-Source Tuning in Dynamic Environments0
Correlative and Discriminative Label Grouping for Multi-Label Visual Prompt Tuning0
Defending Multimodal Backdoored Models by Repulsive Visual Prompt Tuning0
Disentangled Prompt Representation for Domain Generalization0
Do We Really Need a Large Number of Visual Prompts?0
Dynamic Visual Prompt Tuning for Parameter Efficient Transfer Learning0
End-to-end Multi-source Visual Prompt Tuning for Survival Analysis in Whole Slide Images0
Exploring Interpretability for Visual Prompt Tuning with Hierarchical Concepts0
Fair-VPT: Fair Visual Prompt Tuning for Image Classification0
Show:102550
← PrevPage 1 of 2Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1SPT-Deep(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)Mean Accuracy86Unverified
2SPT-Shallow(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)Mean Accuracy84.08Unverified
3SPT-Deep(ViT-B/16_MAE_pretrained_ImageNet-1K)Mean Accuracy83.26Unverified
4VPT-Deep(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)Mean Accuracy83.12Unverified
5GateVPT(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)Mean Accuracy83Unverified
6VPT-Shallow (ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)Mean Accuracy79.26Unverified
7SPT-Shallow(ViT-B/16_MAE_pretrained_ImageNet-1K)Mean Accuracy73.95Unverified
8GateVPT(ViT-B/16_MAE_pretrained_ImageNet-1K)Mean Accuracy73.39Unverified
9VPT-Deep (ViT-B/16_MAE_pretrained_ImageNet-1K)Mean Accuracy72.02Unverified
10VPT-Shallow (ViT-B/16_MAE_pretrained_ImageNet-1K)Mean Accuracy57.84Unverified
#ModelMetricClaimedVerifiedStatus
1SPT-Deep(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)Mean Accuracy76.2Unverified
2GateVPT(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)Mean Accuracy74.84Unverified
3SPT-Shallow(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)Mean Accuracy74.47Unverified
4VPT-Deep(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)Mean Accuracy70.27Unverified
5VPT-Shallow(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)Mean Accuracy67.34Unverified
6SPT-Deep(ViT-B/16_MAE_pretrained_ImageNet-1K)Mean Accuracy67.19Unverified
7SPT-Shallow(ViT-B/16_MAE_pretrained_ImageNet-1K)Mean Accuracy62.53Unverified
8GateVPT(ViT-B/16_MAE_pretrained_ImageNet-1K)Mean Accuracy47.61Unverified
9VPT-Shallow(ViT-B/16_MAE_pretrained_ImageNet-1K)Mean Accuracy39.96Unverified
10VPT-Deep(ViT-B/16_MAE_pretrained_ImageNet-1K)Mean Accuracy36.02Unverified
#ModelMetricClaimedVerifiedStatus
1SPT-Deep(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)Mean Accuracy84.95Unverified
2SPT-Shallow(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)Mean Accuracy83.93Unverified
3GateVPT(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)Mean Accuracy83.38Unverified
4SPT-Deep(ViT-B/16_MAE_pretrained_ImageNet-1K)Mean Accuracy83.15Unverified
5VPT-Deep(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)Mean Accuracy83.04Unverified
6VPT-Shallow(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)Mean Accuracy82.26Unverified
7SPT-Shallow(ViT-B/16_MAE_pretrained_ImageNet-1K)Mean Accuracy80.9Unverified
8GateVPT(ViT-B/16_MAE_pretrained_ImageNet-1K)Mean Accuracy76.86Unverified
9VPT-Shallow(ViT-B/16_MAE_pretrained_ImageNet-1K)Mean Accuracy69.65Unverified
10VPT-Deep(ViT-B/16_MAE_pretrained_ImageNet-1K)Mean Accuracy60.61Unverified
#ModelMetricClaimedVerifiedStatus
1SPT-Deep(ViT-B/16_MAE_pretrained_ImageNet-1K)Mean Accuracy59.23Unverified
2SPT-Deep(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)Mean Accuracy58.36Unverified
3SPT-Shallow(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)Mean Accuracy55.16Unverified
4SPT-Shallow(ViT-B/16_MAE_pretrained_ImageNet-1K)Mean Accuracy53.46Unverified
5GateVPT(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)Mean Accuracy49.1Unverified
6VPT-Deep(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)Mean Accuracy42.38Unverified
7VPT-Shallow(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)Mean Accuracy37.55Unverified
8GateVPT(ViT-B/16_MAE_pretrained_ImageNet-1K)Mean Accuracy36.8Unverified
9VPT-Shallow(ViT-B/16_MAE_pretrained_ImageNet-1K)Mean Accuracy27.5Unverified
10VPT-Deep(ViT-B/16_MAE_pretrained_ImageNet-1K)Mean Accuracy26.57Unverified