SOTAVerified

Decision Making

Papers

Showing 491500 of 12311 papers

TitleStatusHype
Explainable Claim Verification via Knowledge-Grounded Reasoning with Large Language ModelsCode1
AvalonBench: Evaluating LLMs Playing the Game of AvalonCode1
Deep Learning for Two-Stage Robust Integer OptimizationCode1
Beyond Uniform Sampling: Offline Reinforcement Learning with Imbalanced DatasetsCode1
Large Language Model Cascades with Mixture of Thoughts Representations for Cost-efficient ReasoningCode1
MetaTool Benchmark for Large Language Models: Deciding Whether to Use Tools and Which to UseCode1
Trainable Noise Model as an XAI evaluation method: application on Sobol for remote sensing image segmentationCode1
Towards Robust Fidelity for Evaluating Explainability of Graph Neural NetworksCode1
Talk2BEV: Language-enhanced Bird's-eye View Maps for Autonomous DrivingCode1
Mini-BEHAVIOR: A Procedurally Generated Benchmark for Long-horizon Decision-Making in Embodied AICode1
Show:102550
← PrevPage 50 of 1232Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1SRLAAverage Remaining Cycles6.4Unverified