SOTAVerified

Decision Making

Papers

Showing 401425 of 12311 papers

TitleStatusHype
EHRNoteQA: An LLM Benchmark for Real-World Clinical Practice Using Discharge SummariesCode1
Reflect-RL: Two-Player Online RL Fine-Tuning for LMsCode1
XRL-Bench: A Benchmark for Evaluating and Comparing Explainable Reinforcement Learning TechniquesCode1
Dynamic planning in hierarchical active inferenceCode1
Explaining generative diffusion models via visual analysis for interpretable decision-making processCode1
PRISE: LLM-Style Sequence Compression for Learning Temporal Action Abstractions in ControlCode1
Uncertainty Quantification for Forward and Inverse Problems of PDEs via Latent Global EvolutionCode1
Addressing cognitive bias in medical language modelsCode1
TELLER: A Trustworthy Framework for Explainable, Generalizable and Controllable Fake News DetectionCode1
A RAG-Based Multi-Agent LLM System for Natural Hazard Resilience and AdaptationCode1
Self-Calibrating Conformal PredictionCode1
Entropy-Regularized Token-Level Policy Optimization for Language Agent ReinforcementCode1
Premier-TACO is a Few-Shot Policy Learner: Pretraining Multitask Representation via Temporal Action-Driven Contrastive LossCode1
Sym-Q: Adaptive Symbolic Regression via Sequential Decision-MakingCode1
Conformal Convolution and Monte Carlo Meta-learners for Predictive Inference of Individual Treatment EffectsCode1
Measuring Implicit Bias in Explicitly Unbiased Large Language ModelsCode1
Skill Set Optimization: Reinforcing Language Model Behavior via Transferable SkillsCode1
Deep hybrid models: infer and plan in a dynamic worldCode1
LLM Voting: Human Choices and AI Collective Decision MakingCode1
Layered and Staged Monte Carlo Tree Search for SMT Strategy SynthesisCode1
Prompting Large Language Models for Zero-Shot Clinical Prediction with Structured Longitudinal Electronic Health Record DataCode1
HAZARD Challenge: Embodied Decision Making in Dynamically Changing EnvironmentsCode1
Distributional Counterfactual Explanations With Optimal TransportCode1
Word-Level ASR Quality Estimation for Efficient Corpus Sampling and Post-Editing through Analyzing Attentions of a Reference-Free MetricCode1
ORGANA: A Robotic Assistant for Automated Chemistry Experimentation and CharacterizationCode1
Show:102550
← PrevPage 17 of 493Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1SRLAAverage Remaining Cycles6.4Unverified