Zero-Shot Text-to-Image Generation
2021-02-24Code Available3· sign in to hype
Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea Voss, Alec Radford, Mark Chen, Ilya Sutskever
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/openai/DALL-EOfficialIn paperpytorch★ 10,877
- github.com/neonbjb/tortoise-ttspytorch★ 14,824
- github.com/borisdayma/dalle-minijax★ 14,783
- github.com/lucidrains/DALLE-pytorchpytorch★ 5,628
- github.com/kakaobrain/rq-vae-transformerpytorch★ 1,010
- github.com/liuqk3/putpytorch★ 198
- github.com/xyzforever/bevtpytorch★ 161
- github.com/explainingai-code/Dalle-Pytorchpytorch★ 13
- github.com/JoyPang123/Textmagepytorch★ 10
- github.com/teoaivalis/Search-Based_Data_Influence_Analysispytorch★ 2
Abstract
Text-to-image generation has traditionally focused on finding better modeling assumptions for training on a fixed dataset. These assumptions might involve complex architectures, auxiliary losses, or side information such as object part labels or segmentation masks supplied during training. We describe a simple approach for this task based on a transformer that autoregressively models the text and image tokens as a single stream of data. With sufficient data and scale, our approach is competitive with previous domain-specific models when evaluated in a zero-shot fashion.