SOTAVerified

Decision Making

Papers

Showing 301350 of 12311 papers

TitleStatusHype
AT-RAG: An Adaptive RAG Model Enhancing Query Efficiency with Topic Filtering and Iterative ReasoningCode1
Persistent Topological Features in Large Language ModelsCode1
Stable Hadamard Memory: Revitalizing Memory-Augmented Agents for Reinforcement LearningCode1
DiffPO: A causal diffusion model for learning distributions of potential outcomesCode1
Rejecting Hallucinated State Targets during PlanningCode1
Predictive Coding for Decision TransformerCode1
ColaCare: Enhancing Electronic Health Record Modeling through Large Language Model-Driven Multi-Agent CollaborationCode1
On The Planning Abilities of OpenAI's o1 Models: Feasibility, Optimality, and GeneralizabilityCode1
End-to-End Conformal Calibration for Optimization Under UncertaintyCode1
BuildingView: Constructing Urban Building Exteriors Databases with Street View Imagery and Multimodal Large Language ModeCode1
Mastering Chess with a Transformer ModelCode1
MusicLIME: Explainable Multimodal Music UnderstandingCode1
WirelessAgent: Large Language Model Agents for Intelligent Wireless NetworksCode1
SPACE: A Python-based Simulator for Evaluating Decentralized Multi-Robot Task Allocation AlgorithmsCode1
Parallel AutoRegressive Models for Multi-Agent Combinatorial OptimizationCode1
MAS4POI: a Multi-Agents Collaboration System for Next POI RecommendationCode1
Explainable AI for computational pathology identifies model limitations and tissue biomarkersCode1
Trusted Unified Feature-Neighborhood Dynamics for Multi-View ClassificationCode1
EPO: Hierarchical LLM Agents with Environment Preference OptimizationCode1
ml_edm package: a Python toolkit for Machine Learning based Early Decision MakingCode1
MedDec: A Dataset for Extracting Medical Decisions from Discharge SummariesCode1
BLADE: Benchmarking Language Model Agents for Data-Driven ScienceCode1
EmoDynamiX: Emotional Support Dialogue Strategy Prediction by Modelling MiXed Emotions and Discourse DynamicsCode1
Self-Explainable Graph Transformer for Link Sign PredictionCode1
Mitigating Information Loss in Tree-Based Reinforcement Learning via Direct OptimizationCode1
A semantic embedding space based on large language models for modelling human beliefsCode1
Unleashing Artificial Cognition: Integrating Multiple AI SystemsCode1
Adaptive Two-Stage Cloud Resource Scaling via Hierarchical Multi-Indicator Forecasting and Bayesian Decision-MakingCode1
PateGail: A Privacy-Preserving Mobility Trajectory Generator with Imitation LearningCode1
Reinforcement Learning Pair Trading: A Dynamic Scaling approachCode1
Hyp2Nav: Hyperbolic Planning and Curiosity for Crowd NavigationCode1
ActionSwitch: Class-agnostic Detection of Simultaneous Actions in Streaming VideosCode1
InvAgent: A Large Language Model based Multi-Agent System for Inventory Management in Supply ChainsCode1
Can Learned Optimization Make Reinforcement Learning Less Difficult?Code1
Integrating Clinical Knowledge into Concept Bottleneck ModelsCode1
A Mamba-based Siamese Network for Remote Sensing Change DetectionCode1
Language Model Alignment in Multilingual Trolley ProblemsCode1
PUZZLES: A Benchmark for Neural Algorithmic ReasoningCode1
CELLO: Causal Evaluation of Large Vision-Language ModelsCode1
Evidential Concept Embedding Models: Towards Reliable Concept Explanations for Skin Disease DiagnosisCode1
Learning Temporal Distances: Contrastive Successor Features Can Provide a Metric Structure for Decision-MakingCode1
What Do VLMs NOTICE? A Mechanistic Interpretability Pipeline for Gaussian-Noise-free Text-Image Corruption and EvaluationCode1
ImageFlowNet: Forecasting Multiscale Image-Level Trajectories of Disease Progression with Irregularly-Sampled Longitudinal Medical ImagesCode1
Ask-before-Plan: Proactive Language Agents for Real-World PlanningCode1
Statistical Uncertainty in Word Embeddings: GloVe-VCode1
LUMA: A Benchmark Dataset for Learning from Uncertain and Multimodal DataCode1
Beyond ELBOs: A Large-Scale Evaluation of Variational Methods for SamplingCode1
Open Grounded Planning: Challenges and Benchmark ConstructionCode1
RATT: A Thought Structure for Coherent and Correct LLM ReasoningCode1
Towards Rationality in Language and Multimodal Agents: A SurveyCode1
Show:102550
← PrevPage 7 of 247Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1SRLAAverage Remaining Cycles6.4Unverified