Generating Synthetic Wildlife Health Data from Camera Trap Imagery: A Pipeline for Alopecia and Body Condition Training Data
David Brundage
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
No publicly available, ML ready datasets exist for wildlife health conditions in camera trap imagery, creating a fundamental barrier to automated health screening. We present a pipeline for generating synthetic training images depicting alopecia and body condition deterioration in wildlife from real camera trap photographs. Our pipeline constructs a curated base image set from iWildCam using MegaDetector derived bounding boxes and center frame weighted stratified sampling across 8 North American species. A generative phenotype editing system produces controlled severity variants depicting hair loss consistent with mange and emaciation. An adaptive scene drift quality control system uses a sham prefilter and decoupled mask then score approach with complementary day or night metrics to reject images where the generative model altered the original scene. We frame the pipeline explicitly as a screening data source. From 201 base images across 4 species, we generate 553 QC passing synthetic variants with an overall pass rate of 83 percent. A sim to real transfer experiment training exclusively on synthetic data and testing on real camera trap images of suspected health conditions achieves 0.85 AUROC, demonstrating that the synthetic data captures visual features sufficient for screening.