SOTAVerified

A Probabilistic Model for Data Redundancy in the Feature Domain

2023-09-24Unverified0· sign in to hype

Ghurumuruhan Ganesan

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

In this paper, we use a probabilistic model to estimate the number of uncorrelated features in a large dataset. Our model allows for both pairwise feature correlation (collinearity) and interdependency of multiple features (multicollinearity) and we use the probabilistic method to obtain upper and lower bounds of the same order, for the size of a feature set that exhibits low collinearity and low multicollinearity. We also prove an auxiliary result regarding mutually good constrained sets that is of independent interest.

Tasks

Reproductions