Minimum Divergence vs. Maximum Margin: an Empirical Comparison on Seq2Seq Models

2019-05-01ICLR 2019Unverified0· sign in to hype

Huan Zhang, Hai Zhao

Unverified — Be the first to reproduce this paper.

Abstract

Sequence to sequence (seq2seq) models have become a popular framework for neural sequence prediction. While traditional seq2seq models are trained by Maximum Likelihood Estimation (MLE), much recent work has made various attempts to optimize evaluation scores directly to solve the mismatch between training and evaluation, since model predictions are usually evaluated by a task specific evaluation metric like BLEU or ROUGE scores instead of perplexity. This paper for the first time puts this existing work into two categories, a) minimum divergence, and b) maximum margin. We introduce a new training criterion based on the analysis of existing work, and empirically compare models in the two categories. Our experimental results show that our new training criterion can usually work better than existing methods, on both the tasks of machine translation and sentence summarization.

Tasks

Machine Translation Sentence Sentence Summarization Translation

Minimum Divergence vs. Maximum Margin: an Empirical Comparison on Seq2Seq Models

Abstract

Tasks

Reproductions