| Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference | Mar 7, 2024 | Chatbot | CodeCode Available | 14 | 5 |
| Yi: Open Foundation Models by 01.AI | Mar 7, 2024 | AttributeChatbot | CodeCode Available | 9 | 5 |
| Scaling Speech-Text Pre-training with Synthetic Interleaved Data | Nov 26, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 7 | 5 |
| DeepSeek-VL: Towards Real-World Vision-Language Understanding | Mar 8, 2024 | ChatbotLanguage Modelling | CodeCode Available | 7 | 5 |
| LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset | Sep 21, 2023 | ChatbotDiversity | CodeCode Available | 7 | 5 |
| GLM-4-Voice: Towards Intelligent and Human-Like End-to-End Spoken Chatbot | Dec 3, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 7 | 5 |
| Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena | Jun 9, 2023 | ChatbotLanguage Modelling | CodeCode Available | 7 | 5 |
| h2oGPT: Democratizing Large Language Models | Jun 13, 2023 | ChatbotFairness | CodeCode Available | 6 | 5 |
| Mistral 7B | Oct 10, 2023 | answerability predictionArithmetic Reasoning | CodeCode Available | 6 | 5 |
| QLoRA: Efficient Finetuning of Quantized LLMs | May 23, 2023 | ChatbotGPU | CodeCode Available | 6 | 5 |
| From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline | Jun 17, 2024 | Chatbot | CodeCode Available | 5 | 5 |
| Length-Controlled AlpacaEval: A Simple Way to Debias Automatic Evaluators | Apr 6, 2024 | Chatbotcounterfactual | CodeCode Available | 5 | 5 |
| RLHF Workflow: From Reward Modeling to Online RLHF | May 13, 2024 | ChatbotHumanEval | CodeCode Available | 5 | 5 |
| Jamba-1.5: Hybrid Transformer-Mamba Models at Scale | Aug 22, 2024 | ChatbotInstruction Following | CodeCode Available | 5 | 5 |
| SimPO: Simple Preference Optimization with a Reference-Free Reward | May 23, 2024 | ChatbotInstruction Following | CodeCode Available | 4 | 5 |
| Baize: An Open-Source Chat Model with Parameter-Efficient Tuning on Self-Chat Data | Apr 3, 2023 | ChatbotLanguage Modeling | CodeCode Available | 4 | 5 |
| PsyDI: Towards a Personalized and Progressively In-depth Chatbot for Psychological Measurements | Jul 22, 2024 | Chatbot | CodeCode Available | 3 | 5 |
| ELIZA Reanimated: The world's first chatbot restored on the world's first time sharing system | Jan 12, 2025 | Chatbot | CodeCode Available | 3 | 5 |
| WikiChat: Stopping the Hallucination of Large Language Model Chatbots by Few-Shot Grounding on Wikipedia | May 23, 2023 | ChatbotHallucination | CodeCode Available | 3 | 5 |
| WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild | Jun 7, 2024 | BenchmarkingChatbot | CodeCode Available | 3 | 5 |
| Prompt-to-Leaderboard | Feb 20, 2025 | ChatbotLanguage Modeling | CodeCode Available | 3 | 5 |
| Language Model Council: Democratically Benchmarking Foundation Models on Highly Subjective Tasks | Jun 12, 2024 | BenchmarkingChatbot | CodeCode Available | 3 | 5 |
| Improving Model Evaluation using SMART Filtering of Benchmark Datasets | Oct 26, 2024 | ChatbotDiversity | CodeCode Available | 3 | 5 |
| LLaMA-Omni2: LLM-based Real-time Spoken Chatbot with Autoregressive Streaming Speech Synthesis | May 5, 2025 | ChatbotDecoder | CodeCode Available | 3 | 5 |
| CHAI: A CHatbot AI for Task-Oriented Dialogue with Offline Reinforcement Learning | Apr 18, 2022 | ChatbotOffline RL | CodeCode Available | 2 | 5 |
| SMILE: Single-turn to Multi-turn Inclusive Language Expansion via ChatGPT for Mental Health Support | Apr 30, 2023 | Chatbot | CodeCode Available | 2 | 5 |
| WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models | Jun 26, 2024 | ChatbotRed Teaming | CodeCode Available | 2 | 5 |
| Language Model Powered Digital Biology with BRAD | Sep 4, 2024 | ChatbotCode Generation | CodeCode Available | 2 | 5 |
| EVA2.0: Investigating Open-Domain Chinese Dialogue Systems with Large-Scale Pre-Training | Mar 17, 2022 | Chatbot | CodeCode Available | 2 | 5 |
| EduChat: A Large-Scale Language Model-based Chatbot System for Intelligent Education | Aug 5, 2023 | ChatbotLanguage Modeling | CodeCode Available | 2 | 5 |
| SWE-Dev: Evaluating and Training Autonomous Feature-Driven Software Development | May 22, 2025 | Bug fixingChatbot | CodeCode Available | 2 | 5 |
| Efficient LLM Scheduling by Learning to Rank | Aug 28, 2024 | BlockingChatbot | CodeCode Available | 2 | 5 |
| Ten Quick Tips for Harnessing the Power of ChatGPT/GPT-4 in Computational Biology | Mar 29, 2023 | ChatbotPrompt Engineering | CodeCode Available | 2 | 5 |
| SafeDecoding: Defending against Jailbreak Attacks via Safety-Aware Decoding | Feb 14, 2024 | ChatbotCode Generation | CodeCode Available | 2 | 5 |
| MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts | Oct 3, 2023 | ChatbotImage Captioning | CodeCode Available | 2 | 5 |
| Making Them Ask and Answer: Jailbreaking Large Language Models in Few Queries via Disguise and Reconstruction | Feb 28, 2024 | ChatbotReconstruction Attack | CodeCode Available | 2 | 5 |
| MemoryBank: Enhancing Large Language Models with Long-Term Memory | May 17, 2023 | Chatbot | CodeCode Available | 2 | 5 |
| LLM4EDA: Emerging Progress in Large Language Models for Electronic Design Automation | Dec 28, 2023 | Answer GenerationChatbot | CodeCode Available | 2 | 5 |
| Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models | May 24, 2023 | ChatbotNatural Language Understanding | CodeCode Available | 2 | 5 |
| CataractBot: An LLM-Powered Expert-in-the-Loop Chatbot for Cataract Patients | Feb 7, 2024 | Chatbot | CodeCode Available | 1 | 5 |
| Causal Inference for Chatting Handoff | Oct 6, 2022 | Causal InferenceChatbot | CodeCode Available | 1 | 5 |
| From RAG to QA-RAG: Integrating Generative AI for Pharmaceutical Regulatory Compliance Process | Jan 26, 2024 | ChatbotRAG | CodeCode Available | 1 | 5 |
| Few Shot Dialogue State Tracking using Meta-learning | Jan 17, 2021 | ChatbotDialogue State Tracking | CodeCode Available | 1 | 5 |
| EvalTree: Profiling Language Model Weaknesses via Hierarchical Capability Trees | Mar 11, 2025 | ChatbotLanguage Modeling | CodeCode Available | 1 | 5 |
| Bring Your Own Data! Self-Supervised Evaluation for Large Language Models | Jun 23, 2023 | ChatbotLanguage Modeling | CodeCode Available | 1 | 5 |
| Faithful Persona-based Conversational Dataset Generation with Large Language Models | Dec 15, 2023 | ChatbotDataset Generation | CodeCode Available | 1 | 5 |
| Addressing Inquiries about History: An Efficient and Practical Framework for Evaluating Open-domain Chatbot Consistency | Jun 4, 2021 | ChatbotNatural Language Inference | CodeCode Available | 1 | 5 |
| ErAConD: Error Annotated Conversational Dialog Dataset for Grammatical Error Correction | Jul 1, 2022 | ChatbotGrammatical Error Correction | CodeCode Available | 1 | 5 |
| BioImage.IO Chatbot: A Community-Driven AI Assistant for Integrative Computational Bioimaging | Oct 23, 2023 | ChatbotInformation Retrieval | CodeCode Available | 1 | 5 |
| Enhancing Dialogue Generation via Dynamic Graph Knowledge Aggregation | Jun 28, 2023 | ChatbotDialogue Generation | CodeCode Available | 1 | 5 |