SOTAVerified

Decision Making

Papers

Showing 476500 of 12311 papers

TitleStatusHype
DEFN: Dual-Encoder Fourier Group Harmonics Network for Three-Dimensional Indistinct-Boundary Object SegmentationCode1
Advances in Embodied Navigation Using Large Language Models: A SurveyCode1
Interpretable Prototype-based Graph Information BottleneckCode1
Free from Bellman Completeness: Trajectory Stitching via Model-based Return-conditioned Supervised LearningCode1
Hierarchical Framework for Interpretable and Probabilistic Model-Based Safe Reinforcement LearningCode1
EHRXQA: A Multi-Modal Question Answering Dataset for Electronic Health Records with Chest X-ray ImagesCode1
Tree Prompting: Efficient Task Adaptation without Fine-TuningCode1
EconAgent: Large Language Model-Empowered Agents for Simulating Macroeconomic ActivitiesCode1
Bridging the Novice-Expert Gap via Models of Decision-Making: A Case Study on Remediating Math MistakesCode1
Can GPT-4V(ision) Serve Medical Applications? Case Studies on GPT-4V for Multimodal Medical DiagnosisCode1
On Statistical Learning of Branch and Bound for Vehicle Routing OptimizationCode1
QACHECK: A Demonstration System for Question-Guided Multi-Hop Fact-CheckingCode1
Explainable Image Similarity: Integrating Siamese Networks and Grad-CAMCode1
Linear Latent World Models in Simple Transformers: A Case Study on Othello-GPTCode1
What If the TV Was Off? Examining Counterfactual Reasoning Abilities of Multi-modal Language ModelsCode1
Explainable Claim Verification via Knowledge-Grounded Reasoning with Large Language ModelsCode1
AvalonBench: Evaluating LLMs Playing the Game of AvalonCode1
Deep Learning for Two-Stage Robust Integer OptimizationCode1
Beyond Uniform Sampling: Offline Reinforcement Learning with Imbalanced DatasetsCode1
Large Language Model Cascades with Mixture of Thoughts Representations for Cost-efficient ReasoningCode1
MetaTool Benchmark for Large Language Models: Deciding Whether to Use Tools and Which to UseCode1
Trainable Noise Model as an XAI evaluation method: application on Sobol for remote sensing image segmentationCode1
Towards Robust Fidelity for Evaluating Explainability of Graph Neural NetworksCode1
Talk2BEV: Language-enhanced Bird's-eye View Maps for Autonomous DrivingCode1
Mini-BEHAVIOR: A Procedurally Generated Benchmark for Long-horizon Decision-Making in Embodied AICode1
Show:102550
← PrevPage 20 of 493Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1SRLAAverage Remaining Cycles6.4Unverified