SOTAVerified

Topology-Preserving Scaling in Data Augmentation

2024-11-29Unverified0· sign in to hype

Vu-Anh Le, Mehmet Dik

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

We propose an algorithmic framework for dataset normalization in data augmentation pipelines that preserves topological stability under non-uniform scaling transformations. Given a finite metric space \( X R^n \) with Euclidean distance \( d_X \), we consider scaling transformations defined by scaling factors \( s_1, s_2, , s_n > 0 \). Specifically, we define a scaling function \( S \) that maps each point \( x = (x_1, x_2, , x_n) X \) to \[ S(x) = (s_1 x_1, s_2 x_2, , s_n x_n). \] Our main result establishes that the bottleneck distance \( d_B(D, D_S) \) between the persistence diagrams \( D \) of \( X \) and \( D_S \) of \( S(X) \) satisfies: \[ d_B(D, D_S) (s_ - s_ ) diam(X), \] where \( s_ = _1 i n s_i \), \( s_ = _1 i n s_i \), and \( diam(X) \) is the diameter of \( X \). Based on this theoretical guarantee, we formulate an optimization problem to minimize the scaling variability \( _s = s_ - s_ \) under the constraint \( d_B(D, D_S) \), where \( > 0 \) is a user-defined tolerance. We develop an algorithmic solution to this problem, ensuring that data augmentation via scaling transformations preserves essential topological features. We further extend our analysis to higher-dimensional homological features, alternative metrics such as the Wasserstein distance, and iterative or probabilistic scaling scenarios. Our contributions provide a rigorous mathematical framework for dataset normalization in data augmentation pipelines, ensuring that essential topological characteristics are maintained despite scaling transformations.

Tasks

Reproductions