| Bidirectional Model-based Policy Optimization | Jul 4, 2020 | Decision Makingmodel | CodeCode Available | 1 |
| Benchmarks for Deep Off-Policy Evaluation | Mar 30, 2021 | Benchmarkingcontinuous-control | CodeCode Available | 1 |
| Fraud-R1 : A Multi-Round Benchmark for Assessing the Robustness of LLM Against Augmented Fraud and Phishing Inducements | Feb 18, 2025 | Decision MakingFraud Detection | CodeCode Available | 1 |
| Free from Bellman Completeness: Trajectory Stitching via Model-based Return-conditioned Supervised Learning | Oct 30, 2023 | Decision MakingOffline RL | CodeCode Available | 1 |
| ALMA: Hierarchical Learning for Composite Multi-Agent Tasks | May 27, 2022 | Decision MakingInductive Bias | CodeCode Available | 1 |
| From point forecasts to multivariate probabilistic forecasts: The Schaake shuffle for day-ahead electricity price forecasting | Apr 21, 2022 | Decision MakingPrediction Intervals | CodeCode Available | 1 |
| AvalonBench: Evaluating LLMs Playing the Game of Avalon | Oct 8, 2023 | Decision Making | CodeCode Available | 1 |
| CityLearn: Diverse Real-World Environments for Sample-Efficient Navigation Policy Learning | Oct 10, 2019 | Autonomous DrivingDecision Making | CodeCode Available | 1 |
| Benchmarking saliency methods for chest X-ray interpretation | Oct 10, 2022 | BenchmarkingDecision Making | CodeCode Available | 1 |
| BetaZero: Belief-State Planning for Long-Horizon POMDPs using Learned Approximations | May 31, 2023 | Autonomous DrivingDecision Making | CodeCode Available | 1 |