Dialogue
Dialogue is notoriously hard to evaluate. Past approaches have used human evaluation.
Papers
Showing 1–1 of 1 papers
| Title | Status | Hype |
|---|---|---|
| TextBox 2.0: A Text Generation Library with Pre-trained Language Models | Code | 3 |
Benchmark Results
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | BART (TextBox 2.0) | BLEU-1 | 49.58 | — | Unverified |