| Scaling Language Models: Methods, Analysis & Insights from Training Gopher | Dec 8, 2021 | Abstract AlgebraAnachronisms | CodeCode Available | 2 |
| CMoralEval: A Moral Evaluation Benchmark for Chinese Large Language Models | Aug 19, 2024 | DiversityLanguage Modeling | CodeCode Available | 1 |
| HALO: Hierarchical Autonomous Logic-Oriented Orchestration for Multi-Agent LLM Systems | May 17, 2025 | Arithmetic ReasoningCode Generation | CodeCode Available | 1 |
| "Oops, Did I Just Say That?" Testing and Repairing Unethical Suggestions of Large Language Models with Suggest-Critique-Reflect Process | May 4, 2023 | Moral Scenarios | CodeCode Available | 1 |
| Evaluating the Moral Beliefs Encoded in LLMs | Jul 26, 2023 | Moral ScenariosSurvey | CodeCode Available | 1 |
| Measuring Moral Inconsistencies in Large Language Models | Jan 26, 2024 | Decision MakingLanguage Modeling | —Unverified | 0 |
| Moral Sparks in Social Media Narratives | Oct 30, 2023 | EthicsInformativeness | —Unverified | 0 |
| The Moral Turing Test: Evaluating Human-LLM Alignment in Moral Decision-Making | Oct 9, 2024 | Decision MakingMoral Scenarios | —Unverified | 0 |
| Learning Tractable Probabilistic Models for Moral Responsibility and Blame | Oct 8, 2018 | Decision MakingManagement | —Unverified | 0 |
| Enhancing LLM Reasoning with Multi-Path Collaborative Reactive and Reflection agents | Dec 31, 2024 | Moral Scenarios | —Unverified | 0 |