Dialogue Generation
Dialogue generation is the task of "understanding" natural language inputs - within natural language processing in order to produce output. The systems are usually intended for conversing with humans, for instance back and forth dialogue with a conversation agent like a chatbot. Some example benchmarks for this task (see others such as Natural Language Understanding) include FusedChat and Ubuntu DIalogue Corpus (UDC). Models can be evaluated via metrics such as BLEU, ROUGE, and METEOR albeit with challenges in terms of weak correlation with human judgement, that may be addressed by new ones like UnSupervised and Reference-free (USR) and Metric for automatic Unreferenced dialog evaluation (MaUde).
Papers
Showing 71–80 of 606 papers
All datasetsPersona-ChatFusedChatHarry Potter Dialogue DatasetAmazon-5CMU DoGPG-19Reddit (multi-ref)Twitter Dialogue (Noun)Twitter Dialogue (Tense)Ubuntu Dialogue (Activity)Ubuntu Dialogue (Cmd)Ubuntu Dialogue (Entity)
Benchmark Results
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | LMEDR | Avg F1 | 21.99 | — | Unverified |
| 2 | P^2 Bot | Avg F1 | 19.77 | — | Unverified |
| 3 | TransferTransfo | Avg F1 | 19.09 | — | Unverified |
| 4 | Seq2Seq + Attention | Avg F1 | 16.18 | — | Unverified |
| 5 | Synthesizer (R+V) | BLEU-1 | 14.7 | — | Unverified |
| 6 | KV Profile Memory | Avg F1 | 11.9 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | Classification-based model | Slot Accuracy | 0.97 | — | Unverified |
| 2 | Two-in-one model | Slot Accuracy | 0.97 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | mm | 1 in 10 R@2 | 5 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | ∞-former (Sticky memories) | F1 | 9.01 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | ∞-former (Sticky memories + initialized GPT-2 Small) | Perplexity | 32.48 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | SpaceFusion | interest (human) | 2.53 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | MrRNN Act.-Ent. | F1 | 4.63 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | MrRNN Act.-Ent. | Accuracy | 34.48 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | MrRNN Act.-Ent. | F1 | 11.43 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | MrRNN Act.-Ent. | Accuracy | 95.04 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | MrRNN Act.-Ent. | F1 | 3.72 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | MrRNN Act.-Ent. | Accuracy | 29.01 | — | Unverified |