SOTAVerified|Agents Browse Leaderboard About

Chatbot

Chatbot or conversational AI is a language model designed and implemented to have conversations with humans.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 301–325 of 971 papers

Title	Date	Tasks	Status	Hype
Annotation alignment: Comparing LLM and human annotations of conversational safety	Jun 10, 2024	Chatbot	—Unverified	0
WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild	Jun 7, 2024	BenchmarkingChatbot	CodeCode Available	3
Speech-based Clinical Depression Screening: An Empirical Study	Jun 5, 2024	ChatbotDiagnostic	—Unverified	0
The Challenges of Evaluating LLM Applications: An Analysis of Automated, Human, and LLM-Based Approaches	Jun 5, 2024	ChatbotInformation Retrieval	—Unverified	0
MixEval: Deriving Wisdom of the Crowd from LLM Benchmark Mixtures	Jun 3, 2024	ChatbotMMLU	—Unverified	0
Demo: Soccer Information Retrieval via Natural Queries using SoccerRAG	Jun 3, 2024	ChatbotInformation Retrieval	CodeCode Available	0
Superhuman performance in urology board questions by an explainable large language model enabled for context integration of the European Association of Urology guidelines: the UroBot study	Jun 3, 2024	ChatbotLanguage Modeling	—Unverified	0
Inverse Constitutional AI: Compressing Preferences into Principles	Jun 2, 2024	ChatbotLanguage Modelling	CodeCode Available	1
Auto-Arena: Automating LLM Evaluations with Agent Peer Battles and Committee Discussions	May 30, 2024	ChatbotFairness	CodeCode Available	0
Phantom: General Trigger Attacks on Retrieval Augmented Language Generation	May 30, 2024	Adversarial TextChatbot	—Unverified	0
Designing an Evaluation Framework for Large Language Models in Astronomy Research	May 30, 2024	AstronomyChatbot	CodeCode Available	0
Automatic detection of cognitive impairment in elderly people using an entertainment chatbot with Natural Language Processing capabilities	May 28, 2024	ChatbotText Generation	—Unverified	0
ChatGPT as the Marketplace of Ideas: Should Truth-Seeking Be the Goal of AI Content Governance?	May 28, 2024	Chatbot	—Unverified	0
Coaching Copilot: Blended Form of an LLM-Powered Chatbot and a Human Coach to Effectively Support Self-Reflection for Leadership Growth	May 24, 2024	ChatbotForm	—Unverified	0
DuanzAI: Slang-Enhanced LLM with Prompt for Humor Understanding	May 23, 2024	Chatbot	CodeCode Available	0
Evaluation of the Programming Skills of Large Language Models	May 23, 2024	ChatbotCode Generation	—Unverified	0
SimPO: Simple Preference Optimization with a Reference-Free Reward	May 23, 2024	ChatbotInstruction Following	CodeCode Available	4
Evaluating Large Language Models with Human Feedback: Establishing a Swedish Benchmark	May 22, 2024	ChatbotLanguage Modeling	CodeCode Available	0
From Human-to-Human to Human-to-Bot Conversations in Software Engineering	May 21, 2024	Chatbot	—Unverified	0
Can AI Relate: Testing Large Language Model Response for Mental Health Support	May 20, 2024	ChatbotLanguage Modeling	CodeCode Available	0
Large Language Models Can Infer Personality from Free-Form User Interactions	May 19, 2024	ChatbotForm	—Unverified	0
CPS-LLM: Large Language Model based Safe Usage Plan Generator for Human-in-the-Loop Human-in-the-Plant Cyber-Physical System	May 19, 2024	ChatbotLanguage Modeling	—Unverified	0
SynDy: Synthetic Dynamic Dataset Generation Framework for Misinformation Tasks	May 17, 2024	ChatbotDataset Generation	—Unverified	0
Tailoring Vaccine Messaging with Common-Ground Opinions	May 17, 2024	ChatbotMisinformation	CodeCode Available	0
From Questions to Insightful Answers: Building an Informed Chatbot for University Resources	May 13, 2024	ChatbotLanguage Modeling	—Unverified	0

Show:10 25 50

← PrevPage 13 of 39Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Yi 34B Chat	Average win rate	27.2	—	Unverified