SOTAVerified

FewVS: A Vision-Semantics Integration Framework for Few-Shot Image Classification

2024-10-28Proceedings of the 32nd ACM International Conference on Multimedia 2024Code Available1· sign in to hype

Zhuoling Li, Yong Wang, Kaitong Li

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

Some recent methods address few-shot image classification by extracting semantic information from class names and devising mechanisms for aligning vision and semantics to integrate information from both modalities. However, class names provide only limited information, which is insufficient to capture the visual details in images. As a result, such vision-semantics alignment is inherently biased, leading to suboptimal integration outcomes. In this paper, we avoid such biased vision-semantics alignment by introducing CLIP, a natural bridge between vision and semantics, and enforcing unbiased vision-vision alignment as a proxy task. Specifically, we align features encoded from the few-shot encoder and CLIP's vision encoder on the same image. This alignment is accomplished through a linear projection layer, with a training objective formulated using optimal transport-based assignment prediction. Thanks to...

Tasks

Reproductions