SOTAVerified

Decision Making

Papers

Showing 276300 of 12311 papers

TitleStatusHype
CARL-GT: Evaluating Causal Reasoning Capabilities of Large Language ModelsCode1
Defeasible Visual Entailment: Benchmark, Evaluator, and Reward-Driven OptimizationCode1
A Generative Framework for Probabilistic, Spatiotemporally Coherent Downscaling of Climate SimulationCode1
Latent Reward: LLM-Empowered Credit Assignment in Episodic Reinforcement LearningCode1
Explainable Fuzzy Neural Network with Multi-Fidelity Reinforcement Learning for Micro-Architecture Design Space ExplorationCode1
WiseAD: Knowledge Augmented End-to-End Autonomous Driving with Vision-Language ModelCode1
Digital Transformation in the Water Distribution System based on the Digital Twins ConceptCode1
SurgBox: Agent-Driven Operating Room Sandbox with Surgery CopilotCode1
AI-Driven Day-to-Day Route ChoiceCode1
BIMCaP: BIM-based AI-supported LiDAR-Camera Pose RefinementCode1
A Survey of Medical Vision-and-Language Applications and Their TechniquesCode1
AssistRAG: Boosting the Potential of Large Language Models with an Intelligent Information AssistantCode1
Large-scale moral machine experiment on large language modelsCode1
BayesianFitForecast: A User-Friendly R Toolbox for Parameter Estimation and Forecasting with Ordinary Differential EquationsCode1
Semantic-Aware Resource Management for C-V2X Platooning via Multi-Agent Reinforcement LearningCode1
Multi-expert Prompting Improves Reliability, Safety, and Usefulness of Large Language ModelsCode1
Online Intrinsic Rewards for Decision Making Agents from Large Language Model FeedbackCode1
DiffLight: A Partial Rewards Conditioned Diffusion Model for Traffic Signal Control with Missing DataCode1
Toward Conditional Distribution Calibration in Survival PredictionCode1
ROCKET-1: Mastering Open-World Interaction with Visual-Temporal Context PromptingCode1
Reflection-Bench: probing AI intelligence with reflectionCode1
A Comprehensive Evaluation of Cognitive Biases in LLMsCode1
MobA: Multifaceted Memory-Enhanced Adaptive Planning for Efficient Mobile Task AutomationCode1
Web Agents with World Models: Learning and Leveraging Environment Dynamics in Web NavigationCode1
Sliding Puzzles Gym: A Scalable Benchmark for State Representation in Visual Reinforcement LearningCode1
Show:102550
← PrevPage 12 of 493Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1SRLAAverage Remaining Cycles6.4Unverified