SOTAVerified

Decision Making

Papers

Showing 2650 of 12311 papers

TitleStatusHype
AutoWebGLM: A Large Language Model-based Web Navigating AgentCode4
Fin-R1: A Large Language Model for Financial Reasoning through Reinforcement LearningCode4
Relationships are Complicated! An Analysis of Relationships Between Datasets on the WebCode4
A Survey on Large Language Model-Based Game AgentsCode4
Eureka: Human-Level Reward Design via Coding Large Language ModelsCode4
TorchRL: A data-driven decision-making library for PyTorchCode4
OmniDrive: A Holistic Vision-Language Dataset for Autonomous Driving with Counterfactual ReasoningCode4
pgmpy: A Python Toolkit for Bayesian NetworksCode4
Cognitive Architectures for Language AgentsCode4
Agent Q: Advanced Reasoning and Learning for Autonomous AI AgentsCode4
Constitutional AI: Harmlessness from AI FeedbackCode4
LIBERO: Benchmarking Knowledge Transfer for Lifelong Robot LearningCode3
MineStudio: A Streamlined Package for Minecraft AI Agent DevelopmentCode3
A Demonstration of Adaptive Collaboration of Large Language Models for Medical Decision-MakingCode3
Sentiment Reasoning for HealthcareCode3
Beyond Quacking: Deep Integration of Language Models and RAG into DuckDBCode3
Beyond A*: Better Planning with Transformers via Search Dynamics BootstrappingCode3
Game-theoretic LLM: Agent Workflow for Negotiation GamesCode3
Hierarchical Prompting Assists Large Language Model on Web NavigationCode3
Auto-RAG: Autonomous Retrieval-Augmented Generation for Large Language ModelsCode3
Automatic Gradient Estimation for Calibrating Crowd Models with Discrete Decision MakingCode3
Evolve Cost-aware Acquisition Functions Using Large Language ModelsCode3
Automated Hypothesis Validation with Agentic Sequential FalsificationsCode3
Evaluating Language Model Agency through NegotiationsCode3
Attention is not not ExplanationCode3
Show:102550
← PrevPage 2 of 493Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1SRLAAverage Remaining Cycles6.4Unverified