SOTAVerified

Memory-Efficient Sequential Pattern Mining with Hybrid Tries

2022-02-06Code Available0· sign in to hype

Amin Hosseininasab, Willem-Jan van Hoeve, Andre A. Cire

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

This paper develops a memory-efficient approach for Sequential Pattern Mining (SPM), a fundamental topic in knowledge discovery that faces a well-known memory bottleneck for large data sets. Our methodology involves a novel hybrid trie data structure that exploits recurring patterns to compactly store the data set in memory; and a corresponding mining algorithm designed to effectively extract patterns from this compact representation. Numerical results on small to medium-sized real-life test instances show an average improvement of 85% in memory consumption and 49% in computation time compared to the state of the art. For large data sets, our algorithm stands out as the only capable SPM approach within 256GB of system memory, potentially saving 1.7TB in memory consumption.

Tasks

Reproductions