Text-To-SQL
Text-to-SQL is a task in natural language processing (NLP) where the goal is to automatically generate SQL queries from natural language text. The task involves converting the text input into a structured representation and then using this representation to generate a semantically correct SQL query that can be executed on a database.
( Image credit: SyntaxSQLNet )
Papers
Showing 1–10 of 424 papers
All datasetsBIRD (BIg Bench for LaRge-scale Database Grounded Text-to-SQL Evaluation)spiderSpider 2.0SParCKaggleDBQASEDESQL-EvalText-To-SQL
Benchmark Results
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | Human Performance | Execution Accurarcy (Human) | 92.96 | — | Unverified |
| 2 | XiYan-SQL | Execution Accuracy % (Test) | 75.63 | — | Unverified |
| 3 | DSAIR + GPT-4o | Execution Accuracy % (Test) | 74.12 | — | Unverified |
| 4 | CHASE-SQL + Gemini | Execution Accuracy % (Test) | 74.06 | — | Unverified |
| 5 | ExSL + granite-34b-code | Execution Accuracy % (Test) | 73.17 | — | Unverified |
| 6 | OpenSearch-SQL+ v2 + GPT-4o | Execution Accuracy % (Test) | 72.28 | — | Unverified |
| 7 | Distillery + GPT-4o | Execution Accuracy % (Test) | 71.83 | — | Unverified |
| 8 | Insights AI | Execution Accuracy % (Test) | 70.26 | — | Unverified |
| 9 | PURPLE + RED + GPT-4o | Execution Accuracy % (Test) | 70.21 | — | Unverified |
| 10 | MCTS-SQL | Execution Accuracy % (Test) | 69.4 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | XiYan-SQL | Execution Accuracy (Test) | 89.65 | — | Unverified |
| 2 | PET-SQL | Execution Accuracy (Test) | 87.6 | — | Unverified |
| 3 | datagpt-sql-7B + InvalidSQL-Feedback | Execution Accuracy (Dev) | 87.2 | — | Unverified |
| 4 | DAIL-SQL + GPT-4 + Self-Consistency | Execution Accuracy (Test) | 86.6 | — | Unverified |
| 5 | DIN-SQL + GPT-4 | Execution Accuracy (Test) | 85.3 | — | Unverified |
| 6 | datagpt-sql-7B | Execution Accuracy (Dev) | 84.8 | — | Unverified |
| 7 | MSc-SQL | Execution Accuracy (Test) | 84.7 | — | Unverified |
| 8 | MARLO + Claude 2.1 | Execution Accuracy (Test) | 84 | — | Unverified |
| 9 | C3 + ChatGPT + Zero-Shot | Execution Accuracy (Test) | 82.3 | — | Unverified |
| 10 | code-davinci-002 175B (LEVER) | Execution Accuracy (Dev) | 81.9 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | Spider-Agent + o1-preview | Success Rate | 17.03 | — | Unverified |
| 2 | Spider-Agent + GPT-4o | Success Rate | 10.13 | — | Unverified |
| 3 | Spider-Agent + Claude-3.5-Sonnect | Success Rate | 9.02 | — | Unverified |
| 4 | Spider-Agent + GPT-4 | Success Rate | 8.86 | — | Unverified |
| 5 | Spider-Agent + Qwen2.5-72B | Success Rate | 6.17 | — | Unverified |
| 6 | Spider-Agent + DeepSeek-V2.5 | Success Rate | 5.22 | — | Unverified |
| 7 | Spider-Agent + Gemini-Pro-1.5 | Success Rate | 2.53 | — | Unverified |
| 8 | Spider-Agent + Llama-3.1-405B | Success Rate | 2.21 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | RASAT+PICARD | interaction match accuracy | 45.2 | — | Unverified |
| 2 | RAT-SQL-TC + GAP | interaction match accuracy | 43.2 | — | Unverified |
| 3 | HIE-SQL + GraPPa | interaction match accuracy | 42.9 | — | Unverified |
| 4 | RAT-SQL + SCoRe | interaction match accuracy | 38.1 | — | Unverified |
| 5 | EditSQL + BERT | interaction match accuracy | 25.3 | — | Unverified |
| 6 | GAZP + BERT | interaction match accuracy | 23.5 | — | Unverified |
| 7 | SyntaxSQL-con | interaction match accuracy | 5.2 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | T5-Large | PCM-F1 (dev) | 48.2 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | XiYan-SQL | Execution Accuracy | 69.86 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | Orange-mini | 0-shot MRR | 74.17 | — | Unverified |