Symbolic Regression with Multimodal Large Language Models and Kolmogorov Arnold Networks

2025-05-12Code Available1· sign in to hype

Thomas R. Harvey, Fabian Ruehle, Cristofero S. Fraser-Taliente, James Halverson

Code Available — Be the first to reproduce this paper.

Code

github.com/harveyThomas4692/llmlex
jax★ 12

Abstract

We present a novel approach to symbolic regression using vision-capable large language models (LLMs) and the ideas behind Google DeepMind's Funsearch. The LLM is given a plot of a univariate function and tasked with proposing an ansatz for that function. The free parameters of the ansatz are fitted using standard numerical optimisers, and a collection of such ans\"atze make up the population of a genetic algorithm. Unlike other symbolic regression techniques, our method does not require the specification of a set of functions to be used in regression, but with appropriate prompt engineering, we can arbitrarily condition the generative step. By using Kolmogorov Arnold Networks (KANs), we demonstrate that ``univariate is all you need'' for symbolic regression, and extend this method to multivariate functions by learning the univariate function on each edge of a trained KAN. The combined expression is then simplified by further processing with a language model.

Tasks

Kolmogorov-Arnold Networks Language Modeling Language Modelling Prompt Engineering regression Symbolic Regression

Symbolic Regression with Multimodal Large Language Models and Kolmogorov Arnold Networks

Code

Abstract

Tasks

Reproductions