SOTAVerified

Decision Making

Papers

Showing 271280 of 12311 papers

TitleStatusHype
Plancraft: an evaluation dataset for planning with LLM agentsCode1
Modality-Projection Universal Model for Comprehensive Full-Body Medical Imaging SegmentationCode1
Constraint-Adaptive Policy Switching for Offline Safe Reinforcement LearningCode1
LegalAgentBench: Evaluating LLM Agents in Legal DomainCode1
CARL-GT: Evaluating Causal Reasoning Capabilities of Large Language ModelsCode1
Multimodal Learning with Uncertainty Quantification based on Discounted Belief FusionCode1
Defeasible Visual Entailment: Benchmark, Evaluator, and Reward-Driven OptimizationCode1
A Generative Framework for Probabilistic, Spatiotemporally Coherent Downscaling of Climate SimulationCode1
Latent Reward: LLM-Empowered Credit Assignment in Episodic Reinforcement LearningCode1
Explainable Fuzzy Neural Network with Multi-Fidelity Reinforcement Learning for Micro-Architecture Design Space ExplorationCode1
Show:102550
← PrevPage 28 of 1232Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1SRLAAverage Remaining Cycles6.4Unverified