SOTAVerified

Inference for Regression with Variables Generated by AI or Machine Learning

2024-02-23Unverified0· sign in to hype

Laura Battaglia, Timothy Christensen, Stephen Hansen, Szymon Sacher

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

Researchers now routinely use AI or other machine learning methods to estimate latent variables of economic interest, then plug-in the estimates as covariates in a regression. We show both theoretically and empirically that naively treating AI/ML-generated variables as "data" leads to biased estimates and invalid inference. To restore valid inference, we propose two methods: (1) an explicit bias correction with bias-corrected confidence intervals, and (2) joint estimation of the regression parameters and latent variables. We illustrate these ideas through applications involving label imputation, dimensionality reduction, and index construction via classification and aggregation.

Tasks

Reproductions