SOTAVerified

OPDR: Order-Preserving Dimension Reduction for Semantic Embedding of Multimodal Scientific Data

2024-08-15Unverified0· sign in to hype

Chengyu Gong, Gefei Shen, Luanzheng Guo, Nathan Tallent, Dongfang Zhao

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

One of the most common operations in multimodal scientific data management is searching for the k most similar items (or, k-nearest neighbors, KNN) from the database after being provided a new item. Although recent advances of multimodal machine learning models offer a semantic index, the so-called embedding vectors mapped from the original multimodal data, the dimension of the resulting embedding vectors are usually on the order of hundreds or a thousand, which are impractically high for time-sensitive scientific applications. This work proposes to reduce the dimensionality of the output embedding vectors such that the set of top-k nearest neighbors do not change in the lower-dimensional space, namely Order-Preserving Dimension Reduction (OPDR). In order to develop such an OPDR method, our central hypothesis is that by analyzing the intrinsic relationship among key parameters during the dimension-reduction map, a quantitative function may be constructed to reveal the correlation between the target (lower) dimensionality and other variables. To demonstrate the hypothesis, this paper first defines a formal measure function to quantify the KNN similarity for a specific vector, then extends the measure into an aggregate accuracy of the global metric spaces, and finally derives a closed-form function between the target (lower) dimensionality and other variables. We incorporate the closed-function into popular dimension-reduction methods, various distance metrics, and embedding models.

Tasks

Reproductions