Recurrent Diffusion for Large-Scale Parameter Generation
Kai Wang, Dongwen Tang, Wangbo Zhao, Yang You
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/nus-hpc-ai-lab/recurrent-parameter-generationOfficialIn paperpytorch★ 78
Abstract
Parameter generation has struggled to scale up for a long time, significantly limiting its range of applications. In this study, we introduce Recurrent diffusion for large-scale Parameter Generation, called RPG. We first divide the trained parameters into non-overlapping parts, after which a recurrent model is proposed to learn their relationships. The recurrent model's outputs, as conditions, are then fed into a diffusion model to generate the neural network parameters. Using only a single GPU, recurrent diffusion enables us to generate popular vision and language models such as ConvNeXt-L and LoRA parameters of LLaMA-7B. Meanwhile, across various architectures and tasks, the generated parameters consistently perform comparable results over trained networks. Notably, our approach also shows the potential to generate models for handling unseen tasks, which largely increases the practicality of parameter generation. Our code is available here.