TQ-DiT: Efficient Time-Aware Quantization for Diffusion Transformers

2025-02-06Unverified0· sign in to hype

Younghye Hwang, Hyojin Lee, Joonhyuk Kang

Unverified — Be the first to reproduce this paper.

Abstract

Diffusion transformers (DiTs) combine transformer architectures with diffusion models. However, their computational complexity imposes significant limitations on real-time applications and sustainability of AI systems. In this study, we aim to enhance the computational efficiency through model quantization, which represents the weights and activation values with lower precision. Multi-region quantization (MRQ) is introduced to address the asymmetric distribution of network values in DiT blocks by allocating two scaling parameters to sub-regions. Additionally, time-grouping quantization (TGQ) is proposed to reduce quantization error caused by temporal variation in activations. The experimental results show that the proposed algorithm achieves performance comparable to the original full-precision model with only a 0.29 increase in FID at W8A8. Furthermore, it outperforms other baselines at W6A6, thereby confirming its suitability for low-bit quantization. These results highlight the potential of our method to enable efficient real-time generative models.

Tasks

Computational Efficiency Quantization

TQ-DiT: Efficient Time-Aware Quantization for Diffusion Transformers

Abstract

Tasks

Reproductions