The impact of lexical and grammatical processing on generating code from natural language
2022-02-28Findings (ACL) 2022Code Available0· sign in to hype
Nathanaël Beau, Benoît Crabbé
Code Available — Be the first to reproduce this paper.
ReproduceCode
- gitlab.com/codegenfact/BertranXOfficialIn paperpytorch★ 0
- gitlab.com/codegenfactors/BertranXOfficialpytorch★ 0
Abstract
Considering the seq2seq architecture of TranX for natural language to code translation, we identify four key components of importance: grammatical constraints, lexical preprocessing, input representations, and copy mechanisms. To study the impact of these components, we use a state-of-the-art architecture that relies on BERT encoder and a grammar-based decoder for which a formalization is provided. The paper highlights the importance of the lexical substitution component in the current natural language to code systems.
Tasks
Benchmark Results
| Dataset | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| CoNaLa | TranX + BERT w/mined | BLEU | 34.2 | — | Unverified |
| Django | TranX + BERT w/mined | Accuracy | 81.03 | — | Unverified |