| HALO: Hierarchical Autonomous Logic-Oriented Orchestration for Multi-Agent LLM Systems | May 17, 2025 | Arithmetic ReasoningCode Generation | CodeCode Available | 1 |
| Measurement of LLM's Philosophies of Human Nature | Apr 3, 2025 | Moral Scenarios | CodeCode Available | 0 |
| Enhancing LLM Reasoning with Multi-Path Collaborative Reactive and Reflection agents | Dec 31, 2024 | Moral Scenarios | —Unverified | 0 |
| M^3oralBench: A MultiModal Moral Benchmark for LVLMs | Dec 30, 2024 | Moral Scenarios | CodeCode Available | 0 |
| Fine-Tuning Language Models for Ethical Ambiguity: A Comparative Study of Alignment with Human Responses | Oct 10, 2024 | Moral Scenarios | —Unverified | 0 |
| The Moral Turing Test: Evaluating Human-LLM Alignment in Moral Decision-Making | Oct 9, 2024 | Decision MakingMoral Scenarios | —Unverified | 0 |
| CMoralEval: A Moral Evaluation Benchmark for Chinese Large Language Models | Aug 19, 2024 | DiversityLanguage Modeling | CodeCode Available | 1 |
| Prompt and Prejudice | Aug 7, 2024 | Decision MakingMoral Scenarios | —Unverified | 0 |
| SaGE: Evaluating Moral Consistency in Large Language Models | Feb 21, 2024 | Decision MakingHellaSwag | CodeCode Available | 0 |
| Measuring Moral Inconsistencies in Large Language Models | Jan 26, 2024 | Decision MakingLanguage Modeling | —Unverified | 0 |