SOTAVerified

Dialogue Generation

Dialogue generation is the task of "understanding" natural language inputs - within natural language processing in order to produce output. The systems are usually intended for conversing with humans, for instance back and forth dialogue with a conversation agent like a chatbot. Some example benchmarks for this task (see others such as Natural Language Understanding) include FusedChat and Ubuntu DIalogue Corpus (UDC). Models can be evaluated via metrics such as BLEU, ROUGE, and METEOR albeit with challenges in terms of weak correlation with human judgement, that may be addressed by new ones like UnSupervised and Reference-free (USR) and Metric for automatic Unreferenced dialog evaluation (MaUde).

Papers

Showing 150 of 606 papers

TitleStatusHype
ZipVoice-Dialog: Non-Autoregressive Spoken Dialogue Generation with Flow MatchingCode4
NLG Evaluation Metrics Beyond Correlation Analysis: An Empirical Metric Preference ChecklistCode3
SDialog: A Python Toolkit for Synthetic Dialogue Generation and AnalysisCode2
Strategist: Learning Strategic Skills by LLMs via Bi-Level Tree SearchCode2
LoRA-Pro: Are Low-Rank Adapters Properly Optimized?Code2
CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker ConversationsCode2
CharacterGLM: Customizing Chinese Conversational AI Characters with Large Language ModelsCode2
PromptCBLUE: A Chinese Prompt Tuning Benchmark for the Medical DomainCode2
SODA: Million-scale Dialogue Distillation with Social Commonsense ContextualizationCode2
CPED: A Large-Scale Chinese Personalized and Emotional Dialogue Dataset for Conversational AICode2
A Large-Scale Chinese Short-Text Conversation DatasetCode2
TransferTransfo: A Transfer Learning Approach for Neural Network Based Conversational AgentsCode2
VisTai: Benchmarking Vision-Language Models for Traditional Chinese in TaiwanCode1
SAGE: Steering and Refining Dialog Generation with State-Action AugmentationCode1
ToolDial: Multi-turn Dialogue Generation Method for Tool-Augmented Language ModelsCode1
SafeDialBench: A Fine-Grained Safety Benchmark for Large Language Models in Multi-Turn Dialogues with Diverse Jailbreak AttacksCode1
DEMO: Reframing Dialogue Interaction with Fine-grained Element ModelingCode1
MA-RLHF: Reinforcement Learning from Human Feedback with Macro ActionsCode1
CoMix: A Comprehensive Benchmark for Multi-Task Comic UnderstandingCode1
Selective Prompting Tuning for Personalized Conversations with LLMsCode1
ESCoT: Towards Interpretable Emotional Support Dialogue SystemsCode1
Modeling Low-Resource Health Coaching Dialogues via Neuro-Symbolic Goal Summarization and Text-Units-Text GenerationCode1
Target-constrained Bidirectional Planning for Generation of Target-oriented Proactive DialogueCode1
Evaluating Very Long-Term Conversational Memory of LLM AgentsCode1
Instruct Once, Chat Consistently in Multiple Rounds: An Efficient Tuning Framework for DialogueCode1
Parameter-Efficient Conversational Recommender System as a Language Processing TaskCode1
ChiMed-GPT: A Chinese Medical Large Language Model with Full Training Regime and Better Alignment to Human PreferencesCode1
PRODIGy: a PROfile-based DIalogue Generation datasetCode1
NoteChat: A Dataset of Synthetic Doctor-Patient Conversations Conditioned on Clinical NotesCode1
MIRACLE: Towards Personalized Dialogue Generation with Latent-Space Multiple Personal Attribute ControlCode1
Enhancing Dialogue Generation via Dynamic Graph Knowledge AggregationCode1
Diverse and Faithful Knowledge-Grounded Dialogue Generation via Sequential Posterior InferenceCode1
VSTAR: A Video-grounded Dialogue Dataset for Situated Semantic Understanding with Scene and Topic TransitionsCode1
Medical Dialogue Generation via Dual Flow ModelingCode1
Improving Empathetic Dialogue Generation by Dynamically Infusing Commonsense KnowledgeCode1
RefGPT: Dialogue Generation of GPT, by GPT, and for GPTCode1
Towards Robust Personalized Dialogue Generation via Order-Insensitive Representation RegularizationCode1
Enhancing Personalized Dialogue Generation with Contrastive Latent Variables: Combining Sparse and Dense PersonaCode1
Parameter-Efficient Fine-Tuning with Layer Pruning on Free-Text Sequence-to-Sequence ModelingCode1
Dialogue Planning via Brownian Bridge Stochastic Process for Goal-directed Proactive DialogueCode1
Controllable Mixed-Initiative Dialogue Generation through PromptingCode1
White-Box Multi-Objective Adversarial Attack on Dialogue GenerationCode1
Lift Yourself Up: Retrieval-augmented Text Generation with Self MemoryCode1
Elastic Weight Removal for Faithful and Abstractive Dialogue GenerationCode1
G-Eval: NLG Evaluation using GPT-4 with Better Human AlignmentCode1
GLM-Dialog: Noise-tolerant Pre-training for Knowledge-grounded Dialogue GenerationCode1
Learning to Memorize Entailment and Discourse Relations for Persona-Consistent DialoguesCode1
SESCORE2: Learning Text Generation Evaluation via Synthesizing Realistic MistakesCode1
Large Language Models Meet Harry Potter: A Bilingual Dataset for Aligning Dialogue Agents with CharactersCode1
Terminology-aware Medical Dialogue GenerationCode1
Show:102550
← PrevPage 1 of 13Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1LMEDRAvg F121.99Unverified
2P^2 BotAvg F119.77Unverified
3TransferTransfoAvg F119.09Unverified
4Seq2Seq + AttentionAvg F116.18Unverified
5Synthesizer (R+V)BLEU-114.7Unverified
6KV Profile MemoryAvg F111.9Unverified
#ModelMetricClaimedVerifiedStatus
1Classification-based modelSlot Accuracy0.97Unverified
2Two-in-one modelSlot Accuracy0.97Unverified
#ModelMetricClaimedVerifiedStatus
1EVAmauve0.97Unverified
2Per-BOBmauve0.95Unverified
#ModelMetricClaimedVerifiedStatus
1mm1 in 10 R@25Unverified
#ModelMetricClaimedVerifiedStatus
1∞-former (Sticky memories)F19.01Unverified
#ModelMetricClaimedVerifiedStatus
1∞-former (Sticky memories + initialized GPT-2 Small)Perplexity32.48Unverified
#ModelMetricClaimedVerifiedStatus
1SpaceFusioninterest (human)2.53Unverified
#ModelMetricClaimedVerifiedStatus
1MrRNN Act.-Ent.F14.63Unverified
#ModelMetricClaimedVerifiedStatus
1MrRNN Act.-Ent.Accuracy34.48Unverified
#ModelMetricClaimedVerifiedStatus
1MrRNN Act.-Ent.F111.43Unverified
#ModelMetricClaimedVerifiedStatus
1MrRNN Act.-Ent.Accuracy95.04Unverified
#ModelMetricClaimedVerifiedStatus
1MrRNN Act.-Ent.F13.72Unverified
#ModelMetricClaimedVerifiedStatus
1MrRNN Act.-Ent.Accuracy29.01Unverified