SOTAVerified

Decision Making

Papers

Showing 711720 of 12311 papers

TitleStatusHype
Bidirectional Model-based Policy OptimizationCode1
Benchmarks for Deep Off-Policy EvaluationCode1
Fraud-R1 : A Multi-Round Benchmark for Assessing the Robustness of LLM Against Augmented Fraud and Phishing InducementsCode1
Free from Bellman Completeness: Trajectory Stitching via Model-based Return-conditioned Supervised LearningCode1
ALMA: Hierarchical Learning for Composite Multi-Agent TasksCode1
From point forecasts to multivariate probabilistic forecasts: The Schaake shuffle for day-ahead electricity price forecastingCode1
AvalonBench: Evaluating LLMs Playing the Game of AvalonCode1
CityLearn: Diverse Real-World Environments for Sample-Efficient Navigation Policy LearningCode1
Benchmarking saliency methods for chest X-ray interpretationCode1
BetaZero: Belief-State Planning for Long-Horizon POMDPs using Learned ApproximationsCode1
Show:102550
← PrevPage 72 of 1232Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1SRLAAverage Remaining Cycles6.4Unverified