SOTAVerified

Response Generation

A task where an agent should play the $DE$ role and generate a text to respond to a $P$ message.

Papers

Showing 2650 of 914 papers

TitleStatusHype
Deep Learning Characterizes Depression and Suicidal Ideation from Eye Movements0
PICO: Secure Transformers via Robust Prompt Isolation and Cybersecurity Oversight0
Auto-SLURP: A Benchmark Dataset for Evaluating Multi-Agent Frameworks in Smart Personal AssistantCode0
Beyond Whole Dialogue Modeling: Contextual Disentanglement for Conversational Recommendation0
LegalRAG: A Hybrid RAG System for Multilingual Legal Information Retrieval0
Accommodate Knowledge Conflicts in Retrieval-augmented LLMs: Towards Reliable Response Generation in the Wild0
MSCRS: Multi-modal Semantic Graph Prompt Learning Framework for Conversational Recommender SystemsCode1
The Quantum LLM: Modeling Semantic Spaces with Quantum Principles0
SafeChat: A Framework for Building Trustworthy Collaborative Assistants and a Case Study of its UsefulnessCode0
RS-RAG: Bridging Remote Sensing Imagery and Comprehensive Knowledge with a Multi-Modal Dataset and Retrieval-Augmented Generation Model0
AGITB: A Signal-Level Benchmark for Evaluating Artificial General IntelligenceCode0
Hawkeye:Efficient Reasoning with Model Collaboration0
Enhancing Large Language Models (LLMs) for Telecommunications using Knowledge Graphs and Retrieval-Augmented Generation0
When LLM Therapists Become Salespeople: Evaluating Large Language Models for Ethical Motivational Interviewing0
Firm or Fickle? Evaluating Large Language Models Consistency in Sequential Interactions0
Clean & Clear: Feasibility of Safe LLM Clinical Guidance0
DEMENTIA-PLAN: An Agent-Based Framework for Multi-Knowledge Graph Retrieval-Augmented Generation in Dementia Care0
CoMAC: Conversational Agent for Multi-Source Auxiliary Context with Sparse and Symmetric Latent InteractionsCode0
Mitigating Reward Over-Optimization in RLHF via Behavior-Supported Regularization0
GINGER: Grounded Information Nugget-Based Generation of ResponsesCode0
Conversational User-AI Intervention: A Study on Prompt Rewriting for Improved LLM Response Generation0
Relevance Isn't All You Need: Scaling RAG Systems With Inference-Time Compute Via Multi-Criteria RerankingCode13
FG-RAG: Enhancing Query-Focused Summarization with Context-Aware Fine-Grained Graph RAGCode0
Intent-Aware Self-Correction for Mitigating Social Biases in Large Language Models0
Attentive Reasoning Queries: A Systematic Method for Optimizing Instruction-Following in Large Language ModelsCode11
Show:102550
← PrevPage 2 of 37Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PaCEBLEU34.1Unverified
2BART-largeBLEU33.1Unverified
3BART-baseBLEU29.4Unverified
4MTNBLEU21.7Unverified
5GPT-2BLEU19.2Unverified
#ModelMetricClaimedVerifiedStatus
1LED(Q,F)Message-F119.54Unverified
2LED(Q,P,H)Message-F116.14Unverified
3LED(Q,P)Message-F114.25Unverified
#ModelMetricClaimedVerifiedStatus
1PaCEBLEU22Unverified
2SimpleTODBLEU20.3Unverified