Uncovering bias in the PlantVillage dataset

2022-06-09Code Available0· sign in to hype

Mehmet Alican Noyan

Code Available — Be the first to reproduce this paper.

Code

github.com/Ipsumio/plantvillage_bias
Officialnone★ 1

Abstract

We report our investigation on the use of the popular PlantVillage dataset for training deep learning based plant disease detection models. We trained a machine learning model using only 8 pixels from the PlantVillage image backgrounds. The model achieved 49.0% accuracy on the held-out test set, well above the random guessing accuracy of 2.6%. This result indicates that the PlantVillage dataset contains noise correlated with the labels and deep learning models can easily exploit this bias to make predictions. Possible approaches to alleviate this problem are discussed.

Tasks

Bias Detection Deep Learning Image Classification

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
PlantVillage_8px	RandomForest_default_hyperparameters	Accuracy (%)	49	—	Unverified

Uncovering bias in the PlantVillage dataset

Code

Abstract

Tasks

Benchmark Results

Reproductions