SOTAVerified

Decision Making

Papers

Showing 401450 of 12311 papers

TitleStatusHype
EHRNoteQA: An LLM Benchmark for Real-World Clinical Practice Using Discharge SummariesCode1
Reflect-RL: Two-Player Online RL Fine-Tuning for LMsCode1
XRL-Bench: A Benchmark for Evaluating and Comparing Explainable Reinforcement Learning TechniquesCode1
Dynamic planning in hierarchical active inferenceCode1
PRISE: LLM-Style Sequence Compression for Learning Temporal Action Abstractions in ControlCode1
Explaining generative diffusion models via visual analysis for interpretable decision-making processCode1
Uncertainty Quantification for Forward and Inverse Problems of PDEs via Latent Global EvolutionCode1
TELLER: A Trustworthy Framework for Explainable, Generalizable and Controllable Fake News DetectionCode1
Addressing cognitive bias in medical language modelsCode1
A RAG-Based Multi-Agent LLM System for Natural Hazard Resilience and AdaptationCode1
Self-Calibrating Conformal PredictionCode1
Entropy-Regularized Token-Level Policy Optimization for Language Agent ReinforcementCode1
Premier-TACO is a Few-Shot Policy Learner: Pretraining Multitask Representation via Temporal Action-Driven Contrastive LossCode1
Conformal Convolution and Monte Carlo Meta-learners for Predictive Inference of Individual Treatment EffectsCode1
Sym-Q: Adaptive Symbolic Regression via Sequential Decision-MakingCode1
Measuring Implicit Bias in Explicitly Unbiased Large Language ModelsCode1
Skill Set Optimization: Reinforcing Language Model Behavior via Transferable SkillsCode1
Deep hybrid models: infer and plan in a dynamic worldCode1
LLM Voting: Human Choices and AI Collective Decision MakingCode1
Layered and Staged Monte Carlo Tree Search for SMT Strategy SynthesisCode1
Prompting Large Language Models for Zero-Shot Clinical Prediction with Structured Longitudinal Electronic Health Record DataCode1
Distributional Counterfactual Explanations With Optimal TransportCode1
HAZARD Challenge: Embodied Decision Making in Dynamically Changing EnvironmentsCode1
Word-Level ASR Quality Estimation for Efficient Corpus Sampling and Post-Editing through Analyzing Attentions of a Reference-Free MetricCode1
ORGANA: A Robotic Assistant for Automated Chemistry Experimentation and CharacterizationCode1
PCB-Vision: A Multiscene RGB-Hyperspectral Benchmark Dataset of Printed Circuit BoardsCode1
Uncertainty quantification for probabilistic machine learning in earth observation using conformal predictionCode1
Uncertainty Quantification on Clinical Trial Outcome PredictionCode1
Escalation Risks from Language Models in Military and Diplomatic Decision-MakingCode1
t-DGR: A Trajectory-Based Deep Generative Replay Method for Continual Learning in Decision MakingCode1
Representation Learning of Multivariate Time Series using Attention and Adversarial TrainingCode1
SwapTransformer: highway overtaking tactical planner model via imitation learning on OSHA datasetCode1
IdentiFace : A VGG Based Multimodal Facial Biometric SystemCode1
Autonomous Driving using Residual Sensor Fusion and Deep Reinforcement LearningCode1
PDiT: Interleaving Perception and Decision-making Transformers for Deep Reinforcement LearningCode1
LLM-SAP: Large Language Models Situational Awareness Based PlanningCode1
Multimodal Gen-AI for Fundamental Investment ResearchCode1
DuaLight: Enhancing Traffic Signal Control by Leveraging Scenario-Specific and Scenario-Shared KnowledgeCode1
Scalable Agent-Based Modeling for Complex Financial Market SimulationsCode1
Towards More Faithful Natural Language Explanation Using Multi-Level Contrastive Learning in VQACode1
FiFAR: A Fraud Detection Dataset for Learning to DeferCode1
Parameterized Decision-making with Multi-modal Perception for Autonomous DrivingCode1
Transformers in Unsupervised Structure-from-MotionCode1
auto-sktime: Automated Time Series ForecastingCode1
diff History for Neural Language AgentsCode1
Sequential Planning in Large Partially Observable Environments guided by LLMsCode1
Open Datasheets: Machine-readable Documentation for Open Datasets and Responsible AI AssessmentsCode1
BAT: Behavior-Aware Human-Like Trajectory Prediction for Autonomous DrivingCode1
DiffAIL: Diffusion Adversarial Imitation LearningCode1
Using Large Language Models for Hyperparameter OptimizationCode1
Show:102550
← PrevPage 9 of 247Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1SRLAAverage Remaining Cycles6.4Unverified