SOTAVerified

Case-Enhanced Vision Transformer: Improving Explanations of Image Similarity with a ViT-based Similarity Metric

2024-07-24Code Available0· sign in to hype

Ziwei Zhao, David Leake, Xiaomeng Ye, David Crandall

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

This short paper presents preliminary research on the Case-Enhanced Vision Transformer (CEViT), a similarity measurement method aimed at improving the explainability of similarity assessments for image data. Initial experimental results suggest that integrating CEViT into k-Nearest Neighbor (k-NN) classification yields classification accuracy comparable to state-of-the-art computer vision models, while adding capabilities for illustrating differences between classes. CEViT explanations can be influenced by prior cases, to illustrate aspects of similarity relevant to those cases.

Tasks

Reproductions