Machine learning the first stage in 2SLS: Practical guidance from bias decomposition and simulation

2025-05-19Unverified0· sign in to hype

Connor Lennon, Edward Rubin, Glen Waddell

Unverified — Be the first to reproduce this paper.

Abstract

Machine learning (ML) primarily evolved to solve "prediction problems." The first stage of two-stage least squares (2SLS) is a prediction problem, suggesting potential gains from ML first-stage assistance. However, little guidance exists on when ML helps 2SLSx2014or when it hurts. We investigate the implications of inserting ML into 2SLS, decomposing the bias into three informative components. Mechanically, ML-in-2SLS procedures face issues common to prediction and causal-inference settingsx2014and their interaction. Through simulation, we show linear ML methods (e.g., post-Lasso) work well, while nonlinear methods (e.g., random forests, neural nets) generate substantial bias in second-stage estimatesx2014potentially exceeding the bias of endogenous OLS.

Tasks

Causal Inference Prediction

Machine learning the first stage in 2SLS: Practical guidance from bias decomposition and simulation

Abstract

Tasks

Reproductions