DAQ: Delta-Aware Quantization for Post-Training LLM Weight Compression

2026-03-20Unverified0· sign in to hype

Xiaoming Yu, Shize Tang, Guanghua Yu, Linchuan Xie, Song Liu, Jianchen Zhu, Feng Li

Unverified — Be the first to reproduce this paper.

Abstract

We introduce Delta-Aware Quantization (DAQ), a data-free post-training quantization framework that preserves the knowledge acquired during post-training. Standard quantization objectives minimize reconstruction error but are agnostic to the base model, allowing quantization noise to disproportionately corrupt the small-magnitude parameter deltas (ΔW) that encode post-training behavior -- an effect we analyze through the lens of quantization as implicit regularization. DAQ replaces reconstruction-based objectives with two delta-aware metrics -- Sign Preservation Rate and Cosine Similarity -- that directly optimize for directional fidelity of ΔW, requiring only the base and post-trained weight matrices. In a pilot FP8 study, DAQ recovers style-specific capabilities lost under standard quantization while maintaining general performance.

DAQ: Delta-Aware Quantization for Post-Training LLM Weight Compression

Abstract

Reproductions