SOTAVerified

Chatbot

Chatbot or conversational AI is a language model designed and implemented to have conversations with humans.

Source: Open Data Chatbot

Image source

Papers

Showing 150 of 971 papers

TitleStatusHype
Chatbot Arena: An Open Platform for Evaluating LLMs by Human PreferenceCode14
Yi: Open Foundation Models by 01.AICode9
Scaling Speech-Text Pre-training with Synthetic Interleaved DataCode7
DeepSeek-VL: Towards Real-World Vision-Language UnderstandingCode7
LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation DatasetCode7
GLM-4-Voice: Towards Intelligent and Human-Like End-to-End Spoken ChatbotCode7
Judging LLM-as-a-Judge with MT-Bench and Chatbot ArenaCode7
h2oGPT: Democratizing Large Language ModelsCode6
Mistral 7BCode6
QLoRA: Efficient Finetuning of Quantized LLMsCode6
From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder PipelineCode5
Length-Controlled AlpacaEval: A Simple Way to Debias Automatic EvaluatorsCode5
RLHF Workflow: From Reward Modeling to Online RLHFCode5
Jamba-1.5: Hybrid Transformer-Mamba Models at ScaleCode5
SimPO: Simple Preference Optimization with a Reference-Free RewardCode4
Baize: An Open-Source Chat Model with Parameter-Efficient Tuning on Self-Chat DataCode4
PsyDI: Towards a Personalized and Progressively In-depth Chatbot for Psychological MeasurementsCode3
ELIZA Reanimated: The world's first chatbot restored on the world's first time sharing systemCode3
WikiChat: Stopping the Hallucination of Large Language Model Chatbots by Few-Shot Grounding on WikipediaCode3
WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the WildCode3
Prompt-to-LeaderboardCode3
Language Model Council: Democratically Benchmarking Foundation Models on Highly Subjective TasksCode3
Improving Model Evaluation using SMART Filtering of Benchmark DatasetsCode3
LLaMA-Omni2: LLM-based Real-time Spoken Chatbot with Autoregressive Streaming Speech SynthesisCode3
CHAI: A CHatbot AI for Task-Oriented Dialogue with Offline Reinforcement LearningCode2
SMILE: Single-turn to Multi-turn Inclusive Language Expansion via ChatGPT for Mental Health SupportCode2
WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language ModelsCode2
Language Model Powered Digital Biology with BRADCode2
EVA2.0: Investigating Open-Domain Chinese Dialogue Systems with Large-Scale Pre-TrainingCode2
EduChat: A Large-Scale Language Model-based Chatbot System for Intelligent EducationCode2
SWE-Dev: Evaluating and Training Autonomous Feature-Driven Software DevelopmentCode2
Efficient LLM Scheduling by Learning to RankCode2
Ten Quick Tips for Harnessing the Power of ChatGPT/GPT-4 in Computational BiologyCode2
SafeDecoding: Defending against Jailbreak Attacks via Safety-Aware DecodingCode2
MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual ContextsCode2
Making Them Ask and Answer: Jailbreaking Large Language Models in Few Queries via Disguise and ReconstructionCode2
MemoryBank: Enhancing Large Language Models with Long-Term MemoryCode2
LLM4EDA: Emerging Progress in Large Language Models for Electronic Design AutomationCode2
Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language ModelsCode2
CataractBot: An LLM-Powered Expert-in-the-Loop Chatbot for Cataract PatientsCode1
Causal Inference for Chatting HandoffCode1
From RAG to QA-RAG: Integrating Generative AI for Pharmaceutical Regulatory Compliance ProcessCode1
Few Shot Dialogue State Tracking using Meta-learningCode1
EvalTree: Profiling Language Model Weaknesses via Hierarchical Capability TreesCode1
Bring Your Own Data! Self-Supervised Evaluation for Large Language ModelsCode1
Faithful Persona-based Conversational Dataset Generation with Large Language ModelsCode1
Addressing Inquiries about History: An Efficient and Practical Framework for Evaluating Open-domain Chatbot ConsistencyCode1
ErAConD: Error Annotated Conversational Dialog Dataset for Grammatical Error CorrectionCode1
BioImage.IO Chatbot: A Community-Driven AI Assistant for Integrative Computational BioimagingCode1
Enhancing Dialogue Generation via Dynamic Graph Knowledge AggregationCode1
Show:102550
← PrevPage 1 of 20Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Yi 34B ChatAverage win rate27.2Unverified