SOTAVerified

Density based Spatial Clustering of Lines via Probabilistic Generation of Neighbourhood

2024-10-03Unverified0· sign in to hype

Akanksha Das, Malay Bhattacharyya

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

Density based spatial clustering of points in R^n has a myriad of applications in a variety of industries. We generalise this problem to the density based clustering of lines in high-dimensional spaces, keeping in mind there exists no valid distance measure that follows the triangle inequality for lines. In this paper, we design a clustering algorithm that generates a customised neighbourhood for a line of a fixed volume (given as a parameter), based on an optional parameter as a continuous probability density function. This algorithm is not sensitive to the outliers and can effectively identify the noise in the data using a cardinality parameter. One of the pivotal applications of this algorithm is clustering data points in R^n with missing entries, while utilising the domain knowledge of the respective data. In particular, the proposed algorithm is able to cluster n-dimensional data points that contain at least (n-1)-dimensional information. We illustrate the neighbourhoods for the standard probability distributions with continuous probability density functions and demonstrate the effectiveness of our algorithm on various synthetic and real-world datasets (e.g., rail and road networks). The experimental results also highlight its application in clustering incomplete data.

Tasks

Reproductions