SOTAVerified|Agents Browse Leaderboard About Blog

Response Generation

A task where an agent should play the $DE$ role and generate a text to respond to a $P$ message.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 21–30 of 914 papers

Title	Date	Tasks	Status	Hype
GE-Chat: A Graph Enhanced RAG Framework for Evidential Response Generation of LLMs	May 15, 2025	RAGResponse Generation	—Unverified	0
DIF: A Framework for Benchmarking and Verifying Implicit Bias in LLMs	May 15, 2025	BenchmarkingFairness	—Unverified	0
Personalizing Large Language Models using Retrieval Augmented Generation and Knowledge Graph	May 15, 2025	Knowledge GraphsRAG	CodeCode Available	0
PIPA: A Unified Evaluation Protocol for Diagnosing Interactive Planning Agents	May 2, 2025	Instruction FollowingResponse Generation	—Unverified	0
Antidote: A Unified Framework for Mitigating LVLM Hallucinations in Counterfactual Presupposition and Object Perception	Apr 29, 2025	counterfactualHallucination	CodeCode Available	1
Deep Learning Characterizes Depression and Suicidal Ideation from Eye Movements	Apr 29, 2025	Deep LearningResponse Generation	—Unverified	0
PICO: Secure Transformers via Robust Prompt Isolation and Cybersecurity Oversight	Apr 26, 2025	Mixture-of-ExpertsPICO	—Unverified	0
Auto-SLURP: A Benchmark Dataset for Evaluating Multi-Agent Frameworks in Smart Personal Assistant	Apr 25, 2025	Natural Language UnderstandingResponse Generation	CodeCode Available	0
Beyond Whole Dialogue Modeling: Contextual Disentanglement for Conversational Recommendation	Apr 24, 2025	Conversational Recommendationcounterfactual	—Unverified	0
LegalRAG: A Hybrid RAG System for Multilingual Legal Information Retrieval	Apr 19, 2025	Information RetrievalQuestion Answering	—Unverified	0

Show:10 25 50

← PrevPage 3 of 92Next →

All datasets SIMMC2.0 ArgSciChat MMConv

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	PaCE	BLEU	34.1	—	Unverified
2	BART-large	BLEU	33.1	—	Unverified
3	BART-base	BLEU	29.4	—	Unverified
4	MTN	BLEU	21.7	—	Unverified
5	GPT-2	BLEU	19.2	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	LED(Q,F)	Message-F1	19.54	—	Unverified
2	LED(Q,P,H)	Message-F1	16.14	—	Unverified
3	LED(Q,P)	Message-F1	14.25	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	PaCE	BLEU	22	—	Unverified
2	SimpleTOD	BLEU	20.3	—	Unverified