Improving Generalization in Language Model-Based Text-to-SQL Semantic Parsing: Two Simple Semantic Boundary-Based Techniques

2023-05-27Code Available1· sign in to hype

Daking Rai, Bailin Wang, Yilun Zhou, Ziyu Yao

Code Available — Be the first to reproduce this paper.

Code

github.com/dakingrai/ood-generalization-semantic-boundary-techniques
OfficialIn paperpytorch★ 13

Abstract

Compositional and domain generalization present significant challenges in semantic parsing, even for state-of-the-art semantic parsers based on pre-trained language models (LMs). In this study, we empirically investigate improving an LM's generalization in semantic parsing with two simple techniques: at the token level, we introduce a token preprocessing method to preserve the semantic boundaries of tokens produced by LM tokenizers; at the sequence level, we propose to use special tokens to mark the boundaries of components aligned between input and output. Our experimental results on two text-to-SQL semantic parsing datasets show that our token preprocessing, although simple, can substantially improve the LM performance on both types of generalization, and our component boundary marking method is particularly helpful for compositional generalization.

Tasks

Domain Generalization Language Modeling Language Modelling Semantic Parsing Text to SQL Text-To-SQL

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
spider	T5-3B+NatSQL+Token Preprocessing	Execution Accuracy (Test)	78	—	Unverified

Improving Generalization in Language Model-Based Text-to-SQL Semantic Parsing: Two Simple Semantic Boundary-Based Techniques

Code

Abstract

Tasks

Benchmark Results

Reproductions