Identifying Well-formed Natural Language Questions

2018-08-28EMNLP 2018Code Available0· sign in to hype

Manaal Faruqui, Dipanjan Das

Code Available — Be the first to reproduce this paper.

Code

github.com/google-research-datasets/query-wellformedness
none★ 0

Abstract

Understanding search queries is a hard problem as it involves dealing with "word salad" text ubiquitously issued by users. However, if a query resembles a well-formed question, a natural language processing pipeline is able to perform more accurate interpretation, thus reducing downstream compounding errors. Hence, identifying whether or not a query is well formed can enhance query understanding. Here, we introduce a new task of identifying a well-formed natural language question. We construct and release a dataset of 25,100 publicly available questions classified into well-formed and non-wellformed categories and report an accuracy of 70.7% on the test set. We also show that our classifier can be used to improve the performance of neural sequence-to-sequence models for generating questions for reading comprehension.

Tasks

Query Wellformedness

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
Query Wellformedness	word-1, 2 POS-1, 2, 3	Accuracy	70.7	—	Unverified

Identifying Well-formed Natural Language Questions

Code

Abstract

Tasks

Benchmark Results

Reproductions