| Annotation alignment: Comparing LLM and human annotations of conversational safety | Jun 10, 2024 | Chatbot | —Unverified | 0 |
| WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild | Jun 7, 2024 | BenchmarkingChatbot | CodeCode Available | 3 |
| Speech-based Clinical Depression Screening: An Empirical Study | Jun 5, 2024 | ChatbotDiagnostic | —Unverified | 0 |
| The Challenges of Evaluating LLM Applications: An Analysis of Automated, Human, and LLM-Based Approaches | Jun 5, 2024 | ChatbotInformation Retrieval | —Unverified | 0 |
| MixEval: Deriving Wisdom of the Crowd from LLM Benchmark Mixtures | Jun 3, 2024 | ChatbotMMLU | —Unverified | 0 |
| Demo: Soccer Information Retrieval via Natural Queries using SoccerRAG | Jun 3, 2024 | ChatbotInformation Retrieval | CodeCode Available | 0 |
| Superhuman performance in urology board questions by an explainable large language model enabled for context integration of the European Association of Urology guidelines: the UroBot study | Jun 3, 2024 | ChatbotLanguage Modeling | —Unverified | 0 |
| Inverse Constitutional AI: Compressing Preferences into Principles | Jun 2, 2024 | ChatbotLanguage Modelling | CodeCode Available | 1 |
| Auto-Arena: Automating LLM Evaluations with Agent Peer Battles and Committee Discussions | May 30, 2024 | ChatbotFairness | CodeCode Available | 0 |
| Phantom: General Trigger Attacks on Retrieval Augmented Language Generation | May 30, 2024 | Adversarial TextChatbot | —Unverified | 0 |
| Designing an Evaluation Framework for Large Language Models in Astronomy Research | May 30, 2024 | AstronomyChatbot | CodeCode Available | 0 |
| Automatic detection of cognitive impairment in elderly people using an entertainment chatbot with Natural Language Processing capabilities | May 28, 2024 | ChatbotText Generation | —Unverified | 0 |
| ChatGPT as the Marketplace of Ideas: Should Truth-Seeking Be the Goal of AI Content Governance? | May 28, 2024 | Chatbot | —Unverified | 0 |
| Coaching Copilot: Blended Form of an LLM-Powered Chatbot and a Human Coach to Effectively Support Self-Reflection for Leadership Growth | May 24, 2024 | ChatbotForm | —Unverified | 0 |
| DuanzAI: Slang-Enhanced LLM with Prompt for Humor Understanding | May 23, 2024 | Chatbot | CodeCode Available | 0 |
| Evaluation of the Programming Skills of Large Language Models | May 23, 2024 | ChatbotCode Generation | —Unverified | 0 |
| SimPO: Simple Preference Optimization with a Reference-Free Reward | May 23, 2024 | ChatbotInstruction Following | CodeCode Available | 4 |
| Evaluating Large Language Models with Human Feedback: Establishing a Swedish Benchmark | May 22, 2024 | ChatbotLanguage Modeling | CodeCode Available | 0 |
| From Human-to-Human to Human-to-Bot Conversations in Software Engineering | May 21, 2024 | Chatbot | —Unverified | 0 |
| Can AI Relate: Testing Large Language Model Response for Mental Health Support | May 20, 2024 | ChatbotLanguage Modeling | CodeCode Available | 0 |
| Large Language Models Can Infer Personality from Free-Form User Interactions | May 19, 2024 | ChatbotForm | —Unverified | 0 |
| CPS-LLM: Large Language Model based Safe Usage Plan Generator for Human-in-the-Loop Human-in-the-Plant Cyber-Physical System | May 19, 2024 | ChatbotLanguage Modeling | —Unverified | 0 |
| SynDy: Synthetic Dynamic Dataset Generation Framework for Misinformation Tasks | May 17, 2024 | ChatbotDataset Generation | —Unverified | 0 |
| Tailoring Vaccine Messaging with Common-Ground Opinions | May 17, 2024 | ChatbotMisinformation | CodeCode Available | 0 |
| From Questions to Insightful Answers: Building an Informed Chatbot for University Resources | May 13, 2024 | ChatbotLanguage Modeling | —Unverified | 0 |
| RLHF Workflow: From Reward Modeling to Online RLHF | May 13, 2024 | ChatbotHumanEval | CodeCode Available | 5 |
| Exploring the Potential of Conversational AI Support for Agent-Based Social Simulation Model Design | May 12, 2024 | ChatbotPrompt Engineering | —Unverified | 0 |
| Persona Inconstancy in Multi-Agent LLM Collaboration: Conformity, Confabulation, and Impersonation | May 6, 2024 | AI AgentChatbot | CodeCode Available | 0 |
| MedDoc-Bot: A Chat Tool for Comparative Analysis of Large Language Models in the Context of the Pediatric Hypertension Guideline | May 6, 2024 | Chatbot | CodeCode Available | 0 |
| MAmmoTH2: Scaling Instructions from the Web | May 6, 2024 | ChatbotGSM8K | —Unverified | 0 |
| WildChat: 1M ChatGPT Interaction Logs in the Wild | May 2, 2024 | ChatbotInstruction Following | —Unverified | 0 |
| From Keyboard to Chatbot: An AI-powered Integration Platform with Large-Language Models for Teaching Computational Thinking for Young Children | May 1, 2024 | Chatbot | —Unverified | 0 |
| Lessons from the Use of Natural Language Inference (NLI) in Requirements Engineering Tasks | Apr 24, 2024 | ChatbotNatural Language Inference | —Unverified | 0 |
| Domain-Specific Improvement on Psychotherapy Chatbot Using Assistant | Apr 24, 2024 | ChatbotDiversity | —Unverified | 0 |
| Beyond Code Generation: An Observational Study of ChatGPT Usage in Software Engineering Practice | Apr 23, 2024 | ChatbotCode Generation | —Unverified | 0 |
| Using Adaptive Empathetic Responses for Teaching English | Apr 21, 2024 | Chatbot | CodeCode Available | 0 |
| Incorporating Different Verbal Cues to Improve Text-Based Computer-Delivered Health Messaging | Apr 21, 2024 | Chatbot | —Unverified | 0 |
| MM-PhyRLHF: Reinforcement Learning Framework for Multimodal Physics Question-Answering | Apr 19, 2024 | ChatbotDomain Adaptation | —Unverified | 0 |
| LuminLab: An AI-Powered Building Retrofit and Energy Modelling Platform | Apr 14, 2024 | Chatbot | —Unverified | 0 |
| Integrating Physiological Data with Large Language Models for Empathic Human-AI Interaction | Apr 14, 2024 | ChatbotPhysiological Computing | —Unverified | 0 |
| Deceptive Patterns of Intelligent and Interactive Writing Assistants | Apr 14, 2024 | Chatbot | —Unverified | 0 |
| Length-Controlled AlpacaEval: A Simple Way to Debias Automatic Evaluators | Apr 6, 2024 | Chatbotcounterfactual | CodeCode Available | 5 |
| Physics Event Classification Using Large Language Models | Apr 5, 2024 | ChatbotClassification | CodeCode Available | 0 |
| CantTalkAboutThis: Aligning Language Models to Stay on Topic in Dialogues | Apr 4, 2024 | ChatbotInstruction Following | —Unverified | 0 |
| Token Trails: Navigating Contextual Depths in Conversational AI with ChatLLM | Apr 3, 2024 | ChatbotNavigate | —Unverified | 0 |
| Entertainment chatbot for the digital inclusion of elderly people without abstraction capabilities | Mar 29, 2024 | ChatbotSentiment Analysis | —Unverified | 0 |
| A Survey on Large Language Models from Concept to Implementation | Mar 27, 2024 | ChatbotImage Captioning | —Unverified | 0 |
| LARA: Linguistic-Adaptive Retrieval-Augmentation for Multi-Turn Intent Classification | Mar 25, 2024 | ChatbotClassification | —Unverified | 0 |
| Bioinformatics and Biomedical Informatics with ChatGPT: Year One Review | Mar 22, 2024 | ChatbotDrug Discovery | —Unverified | 0 |
| Comprehensive Lipidomic Automation Workflow using Large Language Models | Mar 22, 2024 | AI AgentChatbot | —Unverified | 0 |