SOTAVerified

Decision Making

Papers

Showing 101125 of 12311 papers

TitleStatusHype
Distribution-Free, Risk-Controlling Prediction SetsCode2
Expandable Subspace Ensemble for Pre-Trained Model-Based Class-Incremental LearningCode2
Aligning Superhuman AI with Human Behavior: Chess as a Model SystemCode2
Divide and Conquer: Grounding LLMs as Efficient Decision-Making Agents via Offline Hierarchical Reinforcement LearningCode2
FinCon: A Synthesized LLM Multi-Agent System with Conceptual Verbal Reinforcement for Enhanced Financial Decision MakingCode2
A Review of Safe Reinforcement Learning: Methods, Theory and ApplicationsCode2
DrivingSphere: Building a High-fidelity 4D World for Closed-loop SimulationCode2
Digital Player: Evaluating Large Language Models based Human-like Agent in GamesCode2
ForecastBench: A Dynamic Benchmark of AI Forecasting CapabilitiesCode2
DiLu: A Knowledge-Driven Approach to Autonomous Driving with Large Language ModelsCode2
AGIEval: A Human-Centric Benchmark for Evaluating Foundation ModelsCode2
Diffusion Actor-Critic with Entropy RegulatorCode2
Disentangling Memory and Reasoning Ability in Large Language ModelsCode2
Driving with LLMs: Fusing Object-Level Vector Modality for Explainable Autonomous DrivingCode2
Agentic Knowledgeable Self-awarenessCode2
Deep Generative Models for Offline Policy Learning: Tutorial, Survey, and Perspectives on Future DirectionsCode2
Graph-of-Thought: Utilizing Large Language Models to Solve Complex and Dynamic Business ProblemsCode2
Grounding Large Language Models in Interactive Environments with Online Reinforcement LearningCode2
DecisionNCE: Embodied Multimodal Representations via Implicit Preference LearningCode2
Counterfactual Explanations without Opening the Black Box: Automated Decisions and the GDPRCode2
Adversarial attacks and defenses in explainable artificial intelligence: A surveyCode2
Improving Causal Reasoning in Large Language Models: A SurveyCode2
Cross-Prediction-Powered InferenceCode2
Jack of All Trades, Master of Some, a Multi-Purpose Transformer AgentCode2
Continuously Learning, Adapting, and Improving: A Dual-Process Approach to Autonomous DrivingCode2
Show:102550
← PrevPage 5 of 493Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1SRLAAverage Remaining Cycles6.4Unverified