Heterogeneous Transfer Learning for Building High-Dimensional Generalized Linear Models with Disparate Datasets

2023-12-20Code Available0· sign in to hype

Ruzhang Zhao, Prosenjit Kundu, Arkajyoti Saha, Nilanjan Chatterjee

Code Available — Be the first to reproduce this paper.

Code

github.com/ruzhangzhao/htlgmm
OfficialIn papernone★ 1

Abstract

Development of comprehensive prediction models are often of great interest in many disciplines of science, but datasets with information on all desired features often have small sample sizes. We describe a transfer learning approach for building high-dimensional generalized linear models using data from a main study with detailed information on all predictors and an external, potentially much larger, study that has ascertained a more limited set of predictors. We propose using the external dataset to build a reduced model and then "transfer" the information on underlying parameters for the analysis of the main study through a set of calibration equations which can account for the study-specific effects of design variables. We then propose a penalized generalized method of moment framework for inference and a one-step estimation method that could be implemented using standard glmnet package. We develop asymptotic theory and conduct extensive simulation studies to investigate both predictive performance and post-selection inference properties of the proposed method. Finally, we illustrate an application of the proposed method for the development of risk models for five common diseases using the UK Biobank study, combining information on low-dimensional risk factors and high throughout proteomic biomarkers.

Tasks

Transfer Learning

Heterogeneous Transfer Learning for Building High-Dimensional Generalized Linear Models with Disparate Datasets

Code

Abstract

Tasks

Reproductions