SOTAVerified

Multilingual Text-to-SQL: Benchmarking the Limits of Language Models with Collaborative Language Agents

2025-09-29Code Available0· sign in to hype

Khanh Trinh Pham, Thu Huong Nguyen, Jun Jo, Quoc Viet Hung Nguyen, Thanh Tam Nguyen

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

Text-to-SQL enables natural access to databases, yet most benchmarks are English-only, limiting multilingual progress. We introduce MultiSpider 2.0, extending Spider 2.0 to eight languages (English, German, French, Spanish, Portuguese, Japanese, Chinese, Vietnamese). It preserves Spider 2.0's structural difficulty while adding linguistic and dialectal variability, demanding deeper reasoning for complex SQL. On this benchmark, state-of-the-art LLMs (such as DeepSeek-R1 and OpenAI o1) reach only 4\% execution accuracy when relying on intrinsic reasoning, versus 60\% on MultiSpider 1.0. Therefore, we provide a collaboration-driven language agents baseline that iteratively refines queries, improving accuracy to 15\%. These results reveal a substantial multilingual gap and motivate methods that are robust across languages and ready for real-world enterprise deployment. Our benchmark is available at https://github.com/phkhanhtrinh23/Multilingual_Text_to_SQL.

Reproductions