Recurrent Diffusion for Large-Scale Parameter Generation

2025-01-20Code Available2· sign in to hype

Kai Wang, Dongwen Tang, Wangbo Zhao, Yang You

Code Available — Be the first to reproduce this paper.

Code

github.com/nus-hpc-ai-lab/recurrent-parameter-generation
OfficialIn paperpytorch★ 78

Abstract

Parameter generation has struggled to scale up for a long time, significantly limiting its range of applications. In this study, we introduce Recurrent diffusion for large-scale Parameter Generation, called RPG. We first divide the trained parameters into non-overlapping parts, after which a recurrent model is proposed to learn their relationships. The recurrent model's outputs, as conditions, are then fed into a diffusion model to generate the neural network parameters. Using only a single GPU, recurrent diffusion enables us to generate popular vision and language models such as ConvNeXt-L and LoRA parameters of LLaMA-7B. Meanwhile, across various architectures and tasks, the generated parameters consistently perform comparable results over trained networks. Notably, our approach also shows the potential to generate models for handling unseen tasks, which largely increases the practicality of parameter generation. Our code is available here.

Tasks

GPU

Recurrent Diffusion for Large-Scale Parameter Generation

Code

Abstract

Tasks

Reproductions