JoLT: Joint Probabilistic Predictions on Tabular Data Using LLMs

2025-02-17Code Available2· sign in to hype

Aliaksandra Shysheya, John Bronskill, James Requeima, Shoaib Ahmed Siddiqui, Javier Gonzalez, David Duvenaud, Richard E. Turner

arXiv PDF

Code Available — Be the first to reproduce this paper.

Reproduce

Code

github.com/cambridge-mlg/jolt
OfficialIn paperpytorch★ 16
github.com/requeima/llm_processes
pytorch★ 69

Abstract

We introduce a simple method for probabilistic predictions on tabular data based on Large Language Models (LLMs) called JoLT (Joint LLM Process for Tabular data). JoLT uses the in-context learning capabilities of LLMs to define joint distributions over tabular data conditioned on user-specified side information about the problem, exploiting the vast repository of latent problem-relevant knowledge encoded in LLMs. JoLT defines joint distributions for multiple target variables with potentially heterogeneous data types without any data conversion, data preprocessing, special handling of missing data, or model training, making it accessible and efficient for practitioners. Our experiments show that JoLT outperforms competitive methods on low-shot single-target and multi-target tabular classification and regression tasks. Furthermore, we show that JoLT can automatically handle missing data and perform data imputation by leveraging textual side information. We argue that due to its simplicity and generality, JoLT is an effective approach for a wide variety of real prediction problems.

Tasks

Imputation In-Context Learning tabular-classification

JoLT: Joint Probabilistic Predictions on Tabular Data Using LLMs

Code

Abstract

Tasks

Reproductions