Learning Hierarchical Prompt with Structured Linguistic Knowledge for Vision-Language Models

2023-12-11Code Available1· sign in to hype

Yubin Wang, Xinyang Jiang, De Cheng, Dongsheng Li, Cairong Zhao

Code Available — Be the first to reproduce this paper.

Code

github.com/vill-lab/2024-aaai-hpt
OfficialIn paperpytorch★ 73
github.com/ThomasWangY/2024-AAAI-HPT
pytorch★ 73

Abstract

Prompt learning has become a prevalent strategy for adapting vision-language foundation models to downstream tasks. As large language models (LLMs) have emerged, recent studies have explored the use of category-related descriptions as input to enhance prompt effectiveness. Nevertheless, conventional descriptions fall short of structured information that effectively represents the interconnections among entities or attributes linked to a particular category. To address this limitation and prioritize harnessing structured knowledge, this paper advocates for leveraging LLMs to build a graph for each description to model the entities and attributes describing the category, as well as their correlations. Preexisting prompt tuning methods exhibit inadequacies in managing this structured knowledge. Consequently, we propose a novel approach called Hierarchical Prompt Tuning (HPT), which enables simultaneous modeling of both structured and conventional linguistic knowledge. Specifically, we introduce a relationship-guided attention module to capture pair-wise associations among entities and attributes for low-level prompt learning. In addition, by incorporating high-level and global-level prompts modeling overall semantics, the proposed hierarchical structure forges cross-level interlinks and empowers the model to handle more complex and long-term relationships. Extensive experiments demonstrate that our HPT shows strong effectiveness and generalizes much better than existing SOTA methods. Our code is available at https://github.com/Vill-Lab/2024-AAAI-HPT.

Tasks

Prompt Engineering Prompt Learning

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
Caltech-101	HPT	Harmonic mean	96.65	—	Unverified
DTD	HPT	Harmonic mean	72.16	—	Unverified
EuroSAT	HPT	Harmonic mean	84.82	—	Unverified
FGVC-Aircraft	HPT	Harmonic mean	40.28	—	Unverified
Food-101	HPT	Harmonic mean	91.01	—	Unverified
ImageNet	HPT	Harmonic mean	74.17	—	Unverified
ImageNet-A	HPT	Top-1 accuracy %	50.85	—	Unverified
ImageNet-R	HPT	Top-1 accuracy %	77.38	—	Unverified
ImageNet-S	HPT	Top-1 accuracy %	49.36	—	Unverified
ImageNet V2	HPT	Top-1 accuracy %	65.25	—	Unverified
Oxford 102 Flower	HPT	Harmonic mean	87.16	—	Unverified
Oxford-IIIT Pet Dataset	HPT	Harmonic mean	96.71	—	Unverified
Stanford Cars	HPT	Harmonic mean	75.57	—	Unverified
SUN397	HPT	Harmonic mean	80.88	—	Unverified
UCF101	HPT	Harmonic mean	83.16	—	Unverified

Learning Hierarchical Prompt with Structured Linguistic Knowledge for Vision-Language Models

Code

Abstract

Tasks

Benchmark Results

Reproductions