SOTAVerified|Agents Browse Leaderboard About

Response Generation

A task where an agent should play the $DE$ role and generate a text to respond to a $P$ message.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 181–190 of 914 papers

Title	Date	Tasks	Status	Hype
Void in Language Models	May 20, 2025	MMLUResponse Generation	CodeCode Available	0
Multi-Armed Bandits Meet Large Language Models	May 19, 2025	Decision MakingMulti-Armed Bandits	—Unverified	0
Rethinking Stateful Tool Use in Multi-Turn Dialogues: Benchmarks and Challenges	May 19, 2025	Response Generation	—Unverified	0
ProDS: Preference-oriented Data Selection for Instruction Tuning	May 19, 2025	Response Generation	—Unverified	0
DIF: A Framework for Benchmarking and Verifying Implicit Bias in LLMs	May 15, 2025	BenchmarkingFairness	—Unverified	0
GE-Chat: A Graph Enhanced RAG Framework for Evidential Response Generation of LLMs	May 15, 2025	RAGResponse Generation	—Unverified	0
Personalizing Large Language Models using Retrieval Augmented Generation and Knowledge Graph	May 15, 2025	Knowledge GraphsRAG	CodeCode Available	0
PIPA: A Unified Evaluation Protocol for Diagnosing Interactive Planning Agents	May 2, 2025	Instruction FollowingResponse Generation	—Unverified	0
Deep Learning Characterizes Depression and Suicidal Ideation from Eye Movements	Apr 29, 2025	Deep LearningResponse Generation	—Unverified	0
PICO: Secure Transformers via Robust Prompt Isolation and Cybersecurity Oversight	Apr 26, 2025	Mixture-of-ExpertsPICO	—Unverified	0

Show:10 25 50

← PrevPage 19 of 92Next →

All datasets SIMMC2.0 ArgSciChat MMConv

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	PaCE	BLEU	34.1	—	Unverified
2	BART-large	BLEU	33.1	—	Unverified
3	BART-base	BLEU	29.4	—	Unverified
4	MTN	BLEU	21.7	—	Unverified
5	GPT-2	BLEU	19.2	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	LED(Q,F)	Message-F1	19.54	—	Unverified
2	LED(Q,P,H)	Message-F1	16.14	—	Unverified
3	LED(Q,P)	Message-F1	14.25	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	PaCE	BLEU	22	—	Unverified
2	SimpleTOD	BLEU	20.3	—	Unverified