SOTAVerified

Decision Making

Papers

Showing 251260 of 12311 papers

TitleStatusHype
STeCa: Step-level Trajectory Calibration for LLM Agent LearningCode1
Multi-Objective Causal Bayesian OptimizationCode1
How Far are LLMs from Being Our Digital Twins? A Benchmark for Persona-Based Behavior Chain SimulationCode1
AdaptiveStep: Automatically Dividing Reasoning Step through Model ConfidenceCode1
Benchmarking LLMs for Political Science: A United Nations PerspectiveCode1
RobustX: Robust Counterfactual Explanations Made EasyCode1
Fraud-R1 : A Multi-Round Benchmark for Assessing the Robustness of LLM Against Augmented Fraud and Phishing InducementsCode1
Nuclear Deployed: Analyzing Catastrophic Risks in Decision-making of Autonomous LLM AgentsCode1
SegX: Improving Interpretability of Clinical Image Diagnosis with Segmentation-based EnhancementCode1
Habitizing Diffusion Planning for Efficient and Effective Decision MakingCode1
Show:102550
← PrevPage 26 of 1232Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1SRLAAverage Remaining Cycles6.4Unverified