A Probabilistic Model for Data Redundancy in the Feature Domain
2023-09-24Unverified0· sign in to hype
Ghurumuruhan Ganesan
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
In this paper, we use a probabilistic model to estimate the number of uncorrelated features in a large dataset. Our model allows for both pairwise feature correlation (collinearity) and interdependency of multiple features (multicollinearity) and we use the probabilistic method to obtain upper and lower bounds of the same order, for the size of a feature set that exhibits low collinearity and low multicollinearity. We also prove an auxiliary result regarding mutually good constrained sets that is of independent interest.