Efficient and Stable Reinforcement Learning for Diffusion Language Models

2026-02-09Code Available0· sign in to hype

Jiawei Liu, Xiting Wang, Yuanyuan Zhong, Defu Lian, Yu Yang

Code Available — Be the first to reproduce this paper.

Code

github.com/lolo1222/stp
OfficialIn paper★ 0

Abstract

Reinforcement Learning (RL) is crucial for unlocking the complex reasoning capabilities of Diffusion-based Large Language Models (dLLMs). However, applying RL to dLLMs faces unique challenges in efficiency and stability. To address these challenges, we propose Spatio-Temporal Pruning (STP), a framework designed to simultaneously improve the efficiency and stability of RL for dLLMs. STP compresses the redundancy in the generative process through: (1) spatial pruning, which constrains the exploration space using static priors; and (2) temporal pruning, which bypasses redundant late-stage refinement steps. Our theoretical analysis demonstrates that STP strictly reduces the variance of the log-likelihood estimation, thereby ensuring more stable policy updates. Extensive experiments demonstrate that STP surpasses state-of-the-art baselines in both efficiency and accuracy. Our code is available at https://github.com/Lolo1222/STP.

Efficient and Stable Reinforcement Learning for Diffusion Language Models

Code

Abstract

Reproductions