Domain Adaptation of Attention Heads for Zero-shot Anomaly Detection
Kiyoon Jeong, Jaehyuk Heo, Junyeong Son, Pilsung Kang
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/kiyoonjeong0305/headclipOfficialIn paper★ 0
Abstract
Zero-shot anomaly detection (ZSAD) enables anomaly detection without normal samples from target categories, addressing scenarios where task-specific training data is unavailable. However, existing ZSAD methods either neglect adaptation of vision-language models to anomaly detection or implement only partial adaptation. This paper proposes Head-adaptive CLIP (HeadCLIP), which effectively adapts both text and image encoders. HeadCLIP employs learnable prompts in the text encoder to generalize normality and abnormality concepts, and introduces learnable head weights in the image encoder to dynamically adjust attention head features for task-specific adaptation. A joint anomaly score is further proposed to leverage adapted pixel-level information for enhanced image-level detection. Experiments on 17 datasets across industrial and medical domains demonstrate that HeadCLIP outperforms existing ZSAD methods at both pixel and image levels, achieving improvements of up to 4.9\%p in pixel-level mean anomaly detection score (mAD) and 3.7%p in image-level mAD in the industrial domain, with comparable gains (3.2%p, 3.2%p) in the medical domain. Code and pretrained weights are available at https://github.com/kiyoonjeong0305/HeadCLIP.