SOTAVerified

Combinatorial optimization of the coefficient of determination

2024-10-12Code Available0· sign in to hype

Marc Harary

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

Robust correlation analysis is among the most critical challenges in statistics. Herein, we develop an efficient algorithm for selecting the k- subset of n points in the plane with the highest coefficient of determination ( R^2 ). Drawing from combinatorial geometry, we propose a method called the quadratic sweep that consists of two steps: (i) projectively lifting the data points into R^5 and then (ii) iterating over each linearly separable k-subset. Its basis is that the optimal set of outliers is separable from its complement in R^2 by a conic section, which, in R^5, can be found by a topological sweep in ( n^5 n ) time. Although key proofs of quadratic separability remain underway, we develop strong mathematical intuitions for our conjectures, then experimentally demonstrate our method's optimality over several million trials up to n=30 without error. Implementations in Julia and fully seeded, reproducible experiments are available at https://github.com/marc-harary/QuadraticSweep.

Tasks

Reproductions