Cost Trade-offs of Reasoning and Non-Reasoning Large Language Models in Text-to-SQL
Saurabh Deochake, Debajyoti Mukhopadhyay
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
While Text-to-SQL systems achieve high accuracy, existing efficiency metrics like the Valid Efficiency Score prioritize execution time, a metric we show is fundamentally decoupled from consumption-based cloud billing. This paper evaluates cloud query execution cost trade-offs between reasoning and non-reasoning Large Language Models by performing 180 Text-to-SQL query executions across six LLMs on Google BigQuery using the 230 GB StackOverflow dataset. Our analysis reveals that reasoning models process 44.5% fewer bytes than non-reasoning counterparts while maintaining equivalent correctness at 96.7% to 100%, and that execution time correlates weakly with query cost at r=0.16, indicating that speed optimization does not imply cost efficiency. Non-reasoning models also exhibit extreme cost variance of up to 3.4, producing outliers exceeding 36 GB per query, over 20 the best model's 1.8 GB average, due to missing partition filters and inefficient joins. We identify these prevalent inefficiency patterns and provide deployment guidelines to mitigate financial risks in cost-sensitive enterprise environments.