FlexChunk: Enabling 100M×100M Out-of-Core SpMV (~1.8 min, ~1.7 GB RAM) with Near-Linear Scaling

2025-04-05LessWrong 2025Code Available0· sign in to hype

Daniil Strizhov

Code Available — Be the first to reproduce this paper.

Code

github.com/DanielSwift1992/FlexChunk
none★ 0

Abstract

Handling large-scale sparse matrices is a fundamental task in many scientific and engineering domains, yet standard in-memory approaches often hit the limitations of available RAM. This paper introduces FlexChunk, an algorithm employing a chunking strategy and disk caching for efficient sparse matrix-vector multiplication (SpMV) specifically designed for matrices exceeding available memory. Our experiments demonstrate that FlexChunk achieves near-linear O(N) time complexity and linear memory consumption scaling up to matrix sizes of 100M×100M, processing the largest case in 1 minute 47 seconds with a peak memory footprint of 1.7 GB. A comparison with the optimized SciPy library reveals a key trade-off: while SciPy offers significantly faster computation for in-memory matrices, FlexChunk excels in scenarios requiring disk I/O due to its substantially lower data loading times. The primary contribution is demonstrating FlexChunk as a scalable, memory-efficient solution enabling SpMV operations on problem sizes previously intractable due to memory constraints.

Tasks

Chunking Nature-Inspired Optimization Algorithm

FlexChunk: Enabling 100M×100M Out-of-Core SpMV (~1.8 min, ~1.7 GB RAM) with Near-Linear Scaling

Code

Abstract

Tasks

Reproductions