SOTAVerified

Dialogue Generation

Dialogue generation is the task of "understanding" natural language inputs - within natural language processing in order to produce output. The systems are usually intended for conversing with humans, for instance back and forth dialogue with a conversation agent like a chatbot. Some example benchmarks for this task (see others such as Natural Language Understanding) include FusedChat and Ubuntu DIalogue Corpus (UDC). Models can be evaluated via metrics such as BLEU, ROUGE, and METEOR albeit with challenges in terms of weak correlation with human judgement, that may be addressed by new ones like UnSupervised and Reference-free (USR) and Metric for automatic Unreferenced dialog evaluation (MaUde).

Papers

Showing 125 of 606 papers

TitleStatusHype
ZipVoice-Dialog: Non-Autoregressive Spoken Dialogue Generation with Flow MatchingCode4
NLG Evaluation Metrics Beyond Correlation Analysis: An Empirical Metric Preference ChecklistCode3
CPED: A Large-Scale Chinese Personalized and Emotional Dialogue Dataset for Conversational AICode2
TransferTransfo: A Transfer Learning Approach for Neural Network Based Conversational AgentsCode2
PromptCBLUE: A Chinese Prompt Tuning Benchmark for the Medical DomainCode2
CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker ConversationsCode2
SDialog: A Python Toolkit for Synthetic Dialogue Generation and AnalysisCode2
LoRA-Pro: Are Low-Rank Adapters Properly Optimized?Code2
Strategist: Learning Strategic Skills by LLMs via Bi-Level Tree SearchCode2
CharacterGLM: Customizing Chinese Conversational AI Characters with Large Language ModelsCode2
SODA: Million-scale Dialogue Distillation with Social Commonsense ContextualizationCode2
A Large-Scale Chinese Short-Text Conversation DatasetCode2
Adding Chit-Chat to Enhance Task-Oriented DialoguesCode1
Controllable Mixed-Initiative Dialogue Generation through PromptingCode1
Controlling Dialogue Generation with Semantic ExemplarsCode1
ChiMed-GPT: A Chinese Medical Large Language Model with Full Training Regime and Better Alignment to Human PreferencesCode1
A Model-Agnostic Data Manipulation Method for Persona-based Dialogue GenerationCode1
CoMix: A Comprehensive Benchmark for Multi-Task Comic UnderstandingCode1
Conversations Are Not Flat: Modeling the Dynamic Information Flow across Dialogue UtterancesCode1
BANG: Bridging Autoregressive and Non-autoregressive Generation with Large Scale PretrainingCode1
AugESC: Dialogue Augmentation with Large Language Models for Emotional Support ConversationCode1
BanglaNLG and BanglaT5: Benchmarks and Resources for Evaluating Low-Resource Natural Language Generation in BanglaCode1
A Batch Normalized Inference Network Keeps the KL Vanishing AwayCode1
CASE: Aligning Coarse-to-Fine Cognition and Affection for Empathetic Response GenerationCode1
A Three-Stage Learning Framework for Low-Resource Knowledge-Grounded Dialogue GenerationCode1
Show:102550
← PrevPage 1 of 25Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1LMEDRAvg F121.99Unverified
2P^2 BotAvg F119.77Unverified
3TransferTransfoAvg F119.09Unverified
4Seq2Seq + AttentionAvg F116.18Unverified
5Synthesizer (R+V)BLEU-114.7Unverified
6KV Profile MemoryAvg F111.9Unverified
#ModelMetricClaimedVerifiedStatus
1Classification-based modelSlot Accuracy0.97Unverified
2Two-in-one modelSlot Accuracy0.97Unverified
#ModelMetricClaimedVerifiedStatus
1EVAmauve0.97Unverified
2Per-BOBmauve0.95Unverified
#ModelMetricClaimedVerifiedStatus
1mm1 in 10 R@25Unverified
#ModelMetricClaimedVerifiedStatus
1∞-former (Sticky memories)F19.01Unverified
#ModelMetricClaimedVerifiedStatus
1∞-former (Sticky memories + initialized GPT-2 Small)Perplexity32.48Unverified
#ModelMetricClaimedVerifiedStatus
1SpaceFusioninterest (human)2.53Unverified
#ModelMetricClaimedVerifiedStatus
1MrRNN Act.-Ent.F14.63Unverified
#ModelMetricClaimedVerifiedStatus
1MrRNN Act.-Ent.Accuracy34.48Unverified
#ModelMetricClaimedVerifiedStatus
1MrRNN Act.-Ent.F111.43Unverified
#ModelMetricClaimedVerifiedStatus
1MrRNN Act.-Ent.Accuracy95.04Unverified
#ModelMetricClaimedVerifiedStatus
1MrRNN Act.-Ent.F13.72Unverified
#ModelMetricClaimedVerifiedStatus
1MrRNN Act.-Ent.Accuracy29.01Unverified