| Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference | Mar 7, 2024 | Chatbot | CodeCode Available | 14 |
| Yi: Open Foundation Models by 01.AI | Mar 7, 2024 | AttributeChatbot | CodeCode Available | 9 |
| GLM-4-Voice: Towards Intelligent and Human-Like End-to-End Spoken Chatbot | Dec 3, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 7 |
| Scaling Speech-Text Pre-training with Synthetic Interleaved Data | Nov 26, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 7 |
| DeepSeek-VL: Towards Real-World Vision-Language Understanding | Mar 8, 2024 | ChatbotLanguage Modelling | CodeCode Available | 7 |
| LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset | Sep 21, 2023 | ChatbotDiversity | CodeCode Available | 7 |
| Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena | Jun 9, 2023 | ChatbotLanguage Modelling | CodeCode Available | 7 |
| Mistral 7B | Oct 10, 2023 | answerability predictionArithmetic Reasoning | CodeCode Available | 6 |
| h2oGPT: Democratizing Large Language Models | Jun 13, 2023 | ChatbotFairness | CodeCode Available | 6 |
| QLoRA: Efficient Finetuning of Quantized LLMs | May 23, 2023 | ChatbotGPU | CodeCode Available | 6 |
| Jamba-1.5: Hybrid Transformer-Mamba Models at Scale | Aug 22, 2024 | ChatbotInstruction Following | CodeCode Available | 5 |
| From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline | Jun 17, 2024 | Chatbot | CodeCode Available | 5 |
| RLHF Workflow: From Reward Modeling to Online RLHF | May 13, 2024 | ChatbotHumanEval | CodeCode Available | 5 |
| Length-Controlled AlpacaEval: A Simple Way to Debias Automatic Evaluators | Apr 6, 2024 | Chatbotcounterfactual | CodeCode Available | 5 |
| SimPO: Simple Preference Optimization with a Reference-Free Reward | May 23, 2024 | ChatbotInstruction Following | CodeCode Available | 4 |
| Baize: An Open-Source Chat Model with Parameter-Efficient Tuning on Self-Chat Data | Apr 3, 2023 | ChatbotLanguage Modeling | CodeCode Available | 4 |
| LLaMA-Omni2: LLM-based Real-time Spoken Chatbot with Autoregressive Streaming Speech Synthesis | May 5, 2025 | ChatbotDecoder | CodeCode Available | 3 |
| Prompt-to-Leaderboard | Feb 20, 2025 | ChatbotLanguage Modeling | CodeCode Available | 3 |
| ELIZA Reanimated: The world's first chatbot restored on the world's first time sharing system | Jan 12, 2025 | Chatbot | CodeCode Available | 3 |
| Improving Model Evaluation using SMART Filtering of Benchmark Datasets | Oct 26, 2024 | ChatbotDiversity | CodeCode Available | 3 |
| PsyDI: Towards a Personalized and Progressively In-depth Chatbot for Psychological Measurements | Jul 22, 2024 | Chatbot | CodeCode Available | 3 |
| Language Model Council: Democratically Benchmarking Foundation Models on Highly Subjective Tasks | Jun 12, 2024 | BenchmarkingChatbot | CodeCode Available | 3 |
| WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild | Jun 7, 2024 | BenchmarkingChatbot | CodeCode Available | 3 |
| WikiChat: Stopping the Hallucination of Large Language Model Chatbots by Few-Shot Grounding on Wikipedia | May 23, 2023 | ChatbotHallucination | CodeCode Available | 3 |
| SWE-Dev: Evaluating and Training Autonomous Feature-Driven Software Development | May 22, 2025 | Bug fixingChatbot | CodeCode Available | 2 |
| Language Model Powered Digital Biology with BRAD | Sep 4, 2024 | ChatbotCode Generation | CodeCode Available | 2 |
| Efficient LLM Scheduling by Learning to Rank | Aug 28, 2024 | BlockingChatbot | CodeCode Available | 2 |
| WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models | Jun 26, 2024 | ChatbotRed Teaming | CodeCode Available | 2 |
| Making Them Ask and Answer: Jailbreaking Large Language Models in Few Queries via Disguise and Reconstruction | Feb 28, 2024 | ChatbotReconstruction Attack | CodeCode Available | 2 |
| SafeDecoding: Defending against Jailbreak Attacks via Safety-Aware Decoding | Feb 14, 2024 | ChatbotCode Generation | CodeCode Available | 2 |
| LLM4EDA: Emerging Progress in Large Language Models for Electronic Design Automation | Dec 28, 2023 | Answer GenerationChatbot | CodeCode Available | 2 |
| MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts | Oct 3, 2023 | ChatbotImage Captioning | CodeCode Available | 2 |
| EduChat: A Large-Scale Language Model-based Chatbot System for Intelligent Education | Aug 5, 2023 | ChatbotLanguage Modeling | CodeCode Available | 2 |
| Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models | May 24, 2023 | ChatbotNatural Language Understanding | CodeCode Available | 2 |
| MemoryBank: Enhancing Large Language Models with Long-Term Memory | May 17, 2023 | Chatbot | CodeCode Available | 2 |
| SMILE: Single-turn to Multi-turn Inclusive Language Expansion via ChatGPT for Mental Health Support | Apr 30, 2023 | Chatbot | CodeCode Available | 2 |
| Ten Quick Tips for Harnessing the Power of ChatGPT/GPT-4 in Computational Biology | Mar 29, 2023 | ChatbotPrompt Engineering | CodeCode Available | 2 |
| CHAI: A CHatbot AI for Task-Oriented Dialogue with Offline Reinforcement Learning | Apr 18, 2022 | ChatbotOffline RL | CodeCode Available | 2 |
| EVA2.0: Investigating Open-Domain Chinese Dialogue Systems with Large-Scale Pre-Training | Mar 17, 2022 | Chatbot | CodeCode Available | 2 |
| Decentralized Arena: Towards Democratic and Scalable Automatic Evaluation of Language Models | May 19, 2025 | BenchmarkingChatbot | CodeCode Available | 1 |
| What is Stigma Attributed to? A Theory-Grounded, Expert-Annotated Interview Corpus for Demystifying Mental-Health Stigma | May 19, 2025 | Chatbot | CodeCode Available | 1 |
| CHARM: Calibrating Reward Models With Chatbot Arena Scores | Apr 14, 2025 | Chatbot | CodeCode Available | 1 |
| EvalTree: Profiling Language Model Weaknesses via Hierarchical Capability Trees | Mar 11, 2025 | ChatbotLanguage Modeling | CodeCode Available | 1 |
| Knowledge Graph-Driven Retrieval-Augmented Generation: Integrating Deepseek-R1 with Weaviate for Advanced Chatbot Applications | Feb 16, 2025 | ChatbotLanguage Modeling | CodeCode Available | 1 |
| Improving Your Model Ranking on Chatbot Arena by Vote Rigging | Jan 29, 2025 | Chatbot | CodeCode Available | 1 |
| MiniGPT-Pancreas: Multimodal Large Language Model for Pancreas Cancer Classification and Detection | Dec 20, 2024 | Cancer ClassificationChatbot | CodeCode Available | 1 |
| TransitGPT: A Generative AI-based framework for interacting with GTFS data using Large Language Models | Dec 7, 2024 | ChatbotNatural Language Queries | CodeCode Available | 1 |
| Learning to Assist Humans without Inferring Rewards | Nov 4, 2024 | Chatbotreinforcement-learning | CodeCode Available | 1 |
| Refusal-Trained LLMs Are Easily Jailbroken As Browser Agents | Oct 11, 2024 | ChatbotRed Teaming | CodeCode Available | 1 |
| A Recipe For Building a Compliant Real Estate Chatbot | Oct 7, 2024 | ChatbotInstruction Following | CodeCode Available | 1 |