| STeCa: Step-level Trajectory Calibration for LLM Agent Learning | Feb 20, 2025 | Decision MakingLanguage Modeling | CodeCode Available | 1 |
| Multi-Objective Causal Bayesian Optimization | Feb 20, 2025 | Bayesian OptimizationDecision Making | CodeCode Available | 1 |
| How Far are LLMs from Being Our Digital Twins? A Benchmark for Persona-Based Behavior Chain Simulation | Feb 20, 2025 | Decision Making | CodeCode Available | 1 |
| AdaptiveStep: Automatically Dividing Reasoning Step through Model Confidence | Feb 19, 2025 | Code GenerationDecision Making | CodeCode Available | 1 |
| Benchmarking LLMs for Political Science: A United Nations Perspective | Feb 19, 2025 | BenchmarkingDecision Making | CodeCode Available | 1 |
| RobustX: Robust Counterfactual Explanations Made Easy | Feb 19, 2025 | counterfactualDecision Making | CodeCode Available | 1 |
| Fraud-R1 : A Multi-Round Benchmark for Assessing the Robustness of LLM Against Augmented Fraud and Phishing Inducements | Feb 18, 2025 | Decision MakingFraud Detection | CodeCode Available | 1 |
| Nuclear Deployed: Analyzing Catastrophic Risks in Decision-making of Autonomous LLM Agents | Feb 17, 2025 | Decision Making | CodeCode Available | 1 |
| SegX: Improving Interpretability of Clinical Image Diagnosis with Segmentation-based Enhancement | Feb 14, 2025 | Decision MakingMedical Image Analysis | CodeCode Available | 1 |
| Habitizing Diffusion Planning for Efficient and Effective Decision Making | Feb 10, 2025 | CPUD4RL | CodeCode Available | 1 |