| How Far are LLMs from Being Our Digital Twins? A Benchmark for Persona-Based Behavior Chain Simulation | Feb 20, 2025 | Decision Making | CodeCode Available | 1 |
| STeCa: Step-level Trajectory Calibration for LLM Agent Learning | Feb 20, 2025 | Decision MakingLanguage Modeling | CodeCode Available | 1 |
| Multi-Objective Causal Bayesian Optimization | Feb 20, 2025 | Bayesian OptimizationDecision Making | CodeCode Available | 1 |
| AdaptiveStep: Automatically Dividing Reasoning Step through Model Confidence | Feb 19, 2025 | Code GenerationDecision Making | CodeCode Available | 1 |
| RobustX: Robust Counterfactual Explanations Made Easy | Feb 19, 2025 | counterfactualDecision Making | CodeCode Available | 1 |
| Benchmarking LLMs for Political Science: A United Nations Perspective | Feb 19, 2025 | BenchmarkingDecision Making | CodeCode Available | 1 |
| Fraud-R1 : A Multi-Round Benchmark for Assessing the Robustness of LLM Against Augmented Fraud and Phishing Inducements | Feb 18, 2025 | Decision MakingFraud Detection | CodeCode Available | 1 |
| Nuclear Deployed: Analyzing Catastrophic Risks in Decision-making of Autonomous LLM Agents | Feb 17, 2025 | Decision Making | CodeCode Available | 1 |
| SegX: Improving Interpretability of Clinical Image Diagnosis with Segmentation-based Enhancement | Feb 14, 2025 | Decision MakingMedical Image Analysis | CodeCode Available | 1 |
| Habitizing Diffusion Planning for Efficient and Effective Decision Making | Feb 10, 2025 | CPUD4RL | CodeCode Available | 1 |
| RTBAgent: A LLM-based Agent System for Real-Time Bidding | Feb 2, 2025 | Decision Making | CodeCode Available | 1 |
| Vintix: Action Model via In-Context Reinforcement Learning | Jan 31, 2025 | Decision MakingIn-Context Reinforcement Learning | CodeCode Available | 1 |
| Harnessing Diverse Perspectives: A Multi-Agent Framework for Enhanced Error Detection in Knowledge Graphs | Jan 27, 2025 | Decision MakingKnowledge Graphs | CodeCode Available | 1 |
| MyGO Multiplex CoT: A Method for Self-Reflection in Large Language Models via Double Chain of Thought Thinking | Jan 20, 2025 | Decision MakingGSM8K | CodeCode Available | 1 |
| A Survey of World Models for Autonomous Driving | Jan 20, 2025 | Anomaly DetectionAutonomous Driving | CodeCode Available | 1 |
| NS-Gym: Open-Source Simulation Environments and Benchmarks for Non-Stationary Markov Decision Processes | Jan 16, 2025 | Decision Making | CodeCode Available | 1 |
| O1 Replication Journey -- Part 3: Inference-time Scaling for Medical Reasoning | Jan 11, 2025 | Decision MakingDiagnostic | CodeCode Available | 1 |
| ICFNet: Integrated Cross-modal Fusion Network for Survival Prediction | Jan 6, 2025 | Decision MakingSurvival Prediction | CodeCode Available | 1 |
| Co-Activation Graph Analysis of Safety-Verified and Explainable Deep Reinforcement Learning Policies | Jan 6, 2025 | Decision MakingDeep Reinforcement Learning | CodeCode Available | 1 |
| MIRAGE: Exploring How Large Language Models Perform in Complex Social Interactive Environments | Jan 3, 2025 | Decision Making | CodeCode Available | 1 |
| Plancraft: an evaluation dataset for planning with LLM agents | Dec 30, 2024 | Decision MakingMinecraft | CodeCode Available | 1 |
| Modality-Projection Universal Model for Comprehensive Full-Body Medical Imaging Segmentation | Dec 26, 2024 | Decision MakingDiagnostic | CodeCode Available | 1 |
| Constraint-Adaptive Policy Switching for Offline Safe Reinforcement Learning | Dec 25, 2024 | Decision MakingOffline RL | CodeCode Available | 1 |
| Multimodal Learning with Uncertainty Quantification based on Discounted Belief Fusion | Dec 23, 2024 | Decision MakingMulti-modal Classification | CodeCode Available | 1 |
| LegalAgentBench: Evaluating LLM Agents in Legal Domain | Dec 23, 2024 | Decision Making | CodeCode Available | 1 |