FewVS: A Vision-Semantics Integration Framework for Few-Shot Image Classification
Zhuoling Li, Yong Wang, Kaitong Li
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/zhuolingli/FewVSpytorch★ 22
Abstract
Some recent methods address few-shot image classification by extracting semantic information from class names and devising mechanisms for aligning vision and semantics to integrate information from both modalities. However, class names provide only limited information, which is insufficient to capture the visual details in images. As a result, such vision-semantics alignment is inherently biased, leading to suboptimal integration outcomes. In this paper, we avoid such biased vision-semantics alignment by introducing CLIP, a natural bridge between vision and semantics, and enforcing unbiased vision-vision alignment as a proxy task. Specifically, we align features encoded from the few-shot encoder and CLIP's vision encoder on the same image. This alignment is accomplished through a linear projection layer, with a training objective formulated using optimal transport-based assignment prediction. Thanks to...