| Plancraft: an evaluation dataset for planning with LLM agents | Dec 30, 2024 | Decision MakingMinecraft | CodeCode Available | 1 |
| Modality-Projection Universal Model for Comprehensive Full-Body Medical Imaging Segmentation | Dec 26, 2024 | Decision MakingDiagnostic | CodeCode Available | 1 |
| Constraint-Adaptive Policy Switching for Offline Safe Reinforcement Learning | Dec 25, 2024 | Decision MakingOffline RL | CodeCode Available | 1 |
| LegalAgentBench: Evaluating LLM Agents in Legal Domain | Dec 23, 2024 | Decision Making | CodeCode Available | 1 |
| CARL-GT: Evaluating Causal Reasoning Capabilities of Large Language Models | Dec 23, 2024 | Decision MakingMath | CodeCode Available | 1 |
| Multimodal Learning with Uncertainty Quantification based on Discounted Belief Fusion | Dec 23, 2024 | Decision MakingMulti-modal Classification | CodeCode Available | 1 |
| Defeasible Visual Entailment: Benchmark, Evaluator, and Reward-Driven Optimization | Dec 19, 2024 | Contrastive LearningDecision Making | CodeCode Available | 1 |
| A Generative Framework for Probabilistic, Spatiotemporally Coherent Downscaling of Climate Simulation | Dec 19, 2024 | Decision Making | CodeCode Available | 1 |
| Latent Reward: LLM-Empowered Credit Assignment in Episodic Reinforcement Learning | Dec 15, 2024 | Decision MakingLarge Language Model | CodeCode Available | 1 |
| Explainable Fuzzy Neural Network with Multi-Fidelity Reinforcement Learning for Micro-Architecture Design Space Exploration | Dec 14, 2024 | Bayesian OptimizationDecision Making | CodeCode Available | 1 |