SPROUT: A Scalable Diffusion Foundation Model for Agricultural Vision

2026-03-29Code Available0· sign in to hype

Shuai Xiang, Wei Guo, James Burridge, Shouyang Liu, Hao Lu, Tokihiro Fukatsu

Code Available — Be the first to reproduce this paper.

Code

github.com/utokyo-fieldphenomics-lab/sprout
OfficialIn paper★ 0

Abstract

Vision Foundation Models (VFM) pre-trained on large-scale unlabeled data have achieved remarkable success on general computer vision tasks, yet typically suffer from significant domain gaps when applied to agriculture. In this context, we introduce SPROUT (Scalable Plant Representation model via Open-field Unsupervised Training), a multi-crop, multi-task agricultural foundation model trained via diffusion denoising. SPROUT leverages a VAE-free Pixel-space Diffusion Transformer to learn rich, structure-aware representations through denoising and enabling efficient end-to-end training. We pre-train SPROUT on a curated dataset of 2.6 million high-quality agricultural images spanning diverse crops, growth stages, and environments. Extensive experiments demonstrate that SPROUT consistently outperforms state-of-the-art web-pretrained and agricultural foundation models across a wide range of downstream tasks, while requiring substantially lower pre-training cost. The code and model are available at https://github.com/UTokyo-FieldPhenomics-Lab/SPROUT.

SPROUT: A Scalable Diffusion Foundation Model for Agricultural Vision

Code

Abstract

Reproductions