| Multi-Objective Causal Bayesian Optimization | Feb 20, 2025 | Bayesian OptimizationDecision Making | CodeCode Available | 1 |
| STeCa: Step-level Trajectory Calibration for LLM Agent Learning | Feb 20, 2025 | Decision MakingLanguage Modeling | CodeCode Available | 1 |
| How Far are LLMs from Being Our Digital Twins? A Benchmark for Persona-Based Behavior Chain Simulation | Feb 20, 2025 | Decision Making | CodeCode Available | 1 |
| Benchmarking LLMs for Political Science: A United Nations Perspective | Feb 19, 2025 | BenchmarkingDecision Making | CodeCode Available | 1 |
| AdaptiveStep: Automatically Dividing Reasoning Step through Model Confidence | Feb 19, 2025 | Code GenerationDecision Making | CodeCode Available | 1 |
| RobustX: Robust Counterfactual Explanations Made Easy | Feb 19, 2025 | counterfactualDecision Making | CodeCode Available | 1 |
| Fraud-R1 : A Multi-Round Benchmark for Assessing the Robustness of LLM Against Augmented Fraud and Phishing Inducements | Feb 18, 2025 | Decision MakingFraud Detection | CodeCode Available | 1 |
| Nuclear Deployed: Analyzing Catastrophic Risks in Decision-making of Autonomous LLM Agents | Feb 17, 2025 | Decision Making | CodeCode Available | 1 |
| SegX: Improving Interpretability of Clinical Image Diagnosis with Segmentation-based Enhancement | Feb 14, 2025 | Decision MakingMedical Image Analysis | CodeCode Available | 1 |
| Habitizing Diffusion Planning for Efficient and Effective Decision Making | Feb 10, 2025 | CPUD4RL | CodeCode Available | 1 |
| RTBAgent: A LLM-based Agent System for Real-Time Bidding | Feb 2, 2025 | Decision Making | CodeCode Available | 1 |
| Vintix: Action Model via In-Context Reinforcement Learning | Jan 31, 2025 | Decision MakingIn-Context Reinforcement Learning | CodeCode Available | 1 |
| Harnessing Diverse Perspectives: A Multi-Agent Framework for Enhanced Error Detection in Knowledge Graphs | Jan 27, 2025 | Decision MakingKnowledge Graphs | CodeCode Available | 1 |
| A Survey of World Models for Autonomous Driving | Jan 20, 2025 | Anomaly DetectionAutonomous Driving | CodeCode Available | 1 |
| MyGO Multiplex CoT: A Method for Self-Reflection in Large Language Models via Double Chain of Thought Thinking | Jan 20, 2025 | Decision MakingGSM8K | CodeCode Available | 1 |
| NS-Gym: Open-Source Simulation Environments and Benchmarks for Non-Stationary Markov Decision Processes | Jan 16, 2025 | Decision Making | CodeCode Available | 1 |
| O1 Replication Journey -- Part 3: Inference-time Scaling for Medical Reasoning | Jan 11, 2025 | Decision MakingDiagnostic | CodeCode Available | 1 |
| Co-Activation Graph Analysis of Safety-Verified and Explainable Deep Reinforcement Learning Policies | Jan 6, 2025 | Decision MakingDeep Reinforcement Learning | CodeCode Available | 1 |
| ICFNet: Integrated Cross-modal Fusion Network for Survival Prediction | Jan 6, 2025 | Decision MakingSurvival Prediction | CodeCode Available | 1 |
| MIRAGE: Exploring How Large Language Models Perform in Complex Social Interactive Environments | Jan 3, 2025 | Decision Making | CodeCode Available | 1 |
| Plancraft: an evaluation dataset for planning with LLM agents | Dec 30, 2024 | Decision MakingMinecraft | CodeCode Available | 1 |
| Modality-Projection Universal Model for Comprehensive Full-Body Medical Imaging Segmentation | Dec 26, 2024 | Decision MakingDiagnostic | CodeCode Available | 1 |
| Constraint-Adaptive Policy Switching for Offline Safe Reinforcement Learning | Dec 25, 2024 | Decision MakingOffline RL | CodeCode Available | 1 |
| Multimodal Learning with Uncertainty Quantification based on Discounted Belief Fusion | Dec 23, 2024 | Decision MakingMulti-modal Classification | CodeCode Available | 1 |
| LegalAgentBench: Evaluating LLM Agents in Legal Domain | Dec 23, 2024 | Decision Making | CodeCode Available | 1 |
| CARL-GT: Evaluating Causal Reasoning Capabilities of Large Language Models | Dec 23, 2024 | Decision MakingMath | CodeCode Available | 1 |
| Defeasible Visual Entailment: Benchmark, Evaluator, and Reward-Driven Optimization | Dec 19, 2024 | Contrastive LearningDecision Making | CodeCode Available | 1 |
| A Generative Framework for Probabilistic, Spatiotemporally Coherent Downscaling of Climate Simulation | Dec 19, 2024 | Decision Making | CodeCode Available | 1 |
| Latent Reward: LLM-Empowered Credit Assignment in Episodic Reinforcement Learning | Dec 15, 2024 | Decision MakingLarge Language Model | CodeCode Available | 1 |
| Explainable Fuzzy Neural Network with Multi-Fidelity Reinforcement Learning for Micro-Architecture Design Space Exploration | Dec 14, 2024 | Bayesian OptimizationDecision Making | CodeCode Available | 1 |
| WiseAD: Knowledge Augmented End-to-End Autonomous Driving with Vision-Language Model | Dec 13, 2024 | Autonomous DrivingDecision Making | CodeCode Available | 1 |
| Digital Transformation in the Water Distribution System based on the Digital Twins Concept | Dec 9, 2024 | Decision MakingScheduling | CodeCode Available | 1 |
| SurgBox: Agent-Driven Operating Room Sandbox with Surgery Copilot | Dec 6, 2024 | Decision MakingRAG | CodeCode Available | 1 |
| AI-Driven Day-to-Day Route Choice | Dec 4, 2024 | Decision MakingReinforcement Learning (RL) | CodeCode Available | 1 |
| BIMCaP: BIM-based AI-supported LiDAR-Camera Pose Refinement | Dec 4, 2024 | Decision MakingManagement | CodeCode Available | 1 |
| A Survey of Medical Vision-and-Language Applications and Their Techniques | Nov 19, 2024 | Decision MakingDiagnostic | CodeCode Available | 1 |
| AssistRAG: Boosting the Potential of Large Language Models with an Intelligent Information Assistant | Nov 11, 2024 | Decision MakingHallucination | CodeCode Available | 1 |
| Large-scale moral machine experiment on large language models | Nov 11, 2024 | Autonomous DrivingComputational Efficiency | CodeCode Available | 1 |
| BayesianFitForecast: A User-Friendly R Toolbox for Parameter Estimation and Forecasting with Ordinary Differential Equations | Nov 8, 2024 | Bayesian InferenceDecision Making | CodeCode Available | 1 |
| Semantic-Aware Resource Management for C-V2X Platooning via Multi-Agent Reinforcement Learning | Nov 7, 2024 | Decision MakingFairness | CodeCode Available | 1 |
| Multi-expert Prompting Improves Reliability, Safety, and Usefulness of Large Language Models | Nov 1, 2024 | Decision MakingInformativeness | CodeCode Available | 1 |
| Online Intrinsic Rewards for Decision Making Agents from Large Language Model Feedback | Oct 30, 2024 | Decision MakingLanguage Modeling | CodeCode Available | 1 |
| DiffLight: A Partial Rewards Conditioned Diffusion Model for Traffic Signal Control with Missing Data | Oct 30, 2024 | Decision MakingImputation | CodeCode Available | 1 |
| Toward Conditional Distribution Calibration in Survival Prediction | Oct 27, 2024 | Conformal PredictionDecision Making | CodeCode Available | 1 |
| ROCKET-1: Mastering Open-World Interaction with Visual-Temporal Context Prompting | Oct 23, 2024 | Decision MakingMinecraft | CodeCode Available | 1 |
| Reflection-Bench: probing AI intelligence with reflection | Oct 21, 2024 | counterfactualDecision Making | CodeCode Available | 1 |
| A Comprehensive Evaluation of Cognitive Biases in LLMs | Oct 20, 2024 | Decision Making | CodeCode Available | 1 |
| MobA: Multifaceted Memory-Enhanced Adaptive Planning for Efficient Mobile Task Automation | Oct 17, 2024 | Decision MakingLanguage Modeling | CodeCode Available | 1 |
| Web Agents with World Models: Learning and Leveraging Environment Dynamics in Web Navigation | Oct 17, 2024 | Decision Making | CodeCode Available | 1 |
| Sliding Puzzles Gym: A Scalable Benchmark for State Representation in Visual Reinforcement Learning | Oct 17, 2024 | Decision MakingReinforcement Learning (RL) | CodeCode Available | 1 |