Learning Contextual Representations for Semantic Parsing with Generation-Augmented Pre-Training

2020-12-18Code Available1· sign in to hype

Peng Shi, Patrick Ng, Zhiguo Wang, Henghui Zhu, Alexander Hanbo Li, Jun Wang, Cicero Nogueira dos santos, Bing Xiang

Code Available — Be the first to reproduce this paper.

Code

github.com/awslabs/gap-text2sql
pytorch★ 109
github.com/dmirlab-group/sadga
pytorch★ 35
github.com/c4ai/gap-text2sql
pytorch★ 30

Abstract

Most recently, there has been significant interest in learning contextual representations for various NLP tasks, by leveraging large scale text corpora to train large neural language models with self-supervised learning objectives, such as Masked Language Model (MLM). However, based on a pilot study, we observe three issues of existing general-purpose language models when they are applied to text-to-SQL semantic parsers: fail to detect column mentions in the utterances, fail to infer column mentions from cell values, and fail to compose complex SQL queries. To mitigate these issues, we present a model pre-training framework, Generation-Augmented Pre-training (GAP), that jointly learns representations of natural language utterances and table schemas by leveraging generation models to generate pre-train data. GAP MODEL is trained on 2M utterance-schema pairs and 30K utterance-schema-SQL triples, whose utterances are produced by generative models. Based on experimental results, neural semantic parsers that leverage GAP MODEL as a representation encoder obtain new state-of-the-art results on both SPIDER and CRITERIA-TO-SQL benchmarks.

Tasks

Language Modeling Language Modelling Self-Supervised Learning Semantic Parsing Text to SQL Text-To-SQL

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
spider	RATSQL + GAP	Accuracy	69.7	—	Unverified

Learning Contextual Representations for Semantic Parsing with Generation-Augmented Pre-Training

Code

Abstract

Tasks

Benchmark Results

Reproductions