Learned Data Compression: Challenges and Opportunities for the Future

2024-12-14Unverified0· sign in to hype

Qiyu Liu, Siyuan Han, Jianwei Liao, Jin Li, Jingshu Peng, Jun Du, Lei Chen

Unverified — Be the first to reproduce this paper.

Abstract

Compressing integer keys is a fundamental operation among multiple communities, such as database management (DB), information retrieval (IR), and high-performance computing (HPC). Recent advances in learned indexes have inspired the development of learned compressors, which leverage simple yet compact machine learning (ML) models to compress large-scale sorted keys. The core idea behind learned compressors is to losslessly encode sorted keys by approximating them with error-bounded ML models (e.g., piecewise linear functions) and using a residual array to guarantee accurate key reconstruction. While the concept of learned compressors remains in its early stages of exploration, our benchmark results demonstrate that an SIMD-optimized learned compressor can significantly outperform state-of-the-art CPU-based compressors. Drawing on our preliminary experiments, this vision paper explores the potential of learned data compression to enhance critical areas in DBMS and related domains. Furthermore, we outline the key technical challenges that existing systems must address when integrating this emerging methodology.

Tasks

CPU Data Compression Information Retrieval

Learned Data Compression: Challenges and Opportunities for the Future

Abstract

Tasks

Reproductions