SOTAVerified

Decision Making

Papers

Showing 451500 of 12311 papers

TitleStatusHype
Can language agents be alternatives to PPO? A Preliminary Empirical Study On OpenAI GymCode1
Reason2Drive: Towards Interpretable and Chain-based Reasoning for Autonomous DrivingCode1
RiskBench: A Scenario-based Benchmark for Risk IdentificationCode1
MEDPSeg: Hierarchical polymorphic multitask learning for the segmentation of ground-glass opacities, consolidation, and pulmonary structures on computed tomographyCode1
Understanding the (Extra-)Ordinary: Validating Deep Model Decisions with Prototypical Concept-based ExplanationsCode1
Utilizing Explainability Techniques for Reinforcement Learning Model AssuranceCode1
Video-Bench: A Comprehensive Benchmark and Toolkit for Evaluating Video-based Large Language ModelsCode1
VSViG: Real-time Video-based Seizure Detection via Skeleton-based Spatiotemporal ViGCode1
Large Language Model as a Policy Teacher for Training Reinforcement Learning AgentsCode1
Labeling Neural Representations with Inverse RecognitionCode1
Physical Reasoning and Object Planning for Household Embodied AgentsCode1
From Classification to Clinical Insights: Towards Analyzing and Reasoning About Mobile and Behavioral Health Data With Large Language ModelsCode1
Inherently Interpretable Time Series Classification via Multiple Instance LearningCode1
DocLens: Multi-aspect Fine-grained Evaluation for Medical Text GenerationCode1
ToolTalk: Evaluating Tool-Usage in a Conversational SettingCode1
XplainLLM: A Knowledge-Augmented Dataset for Reliable Grounded Explanations in LLMsCode1
A Comprehensive Evaluation of GPT-4V on Knowledge-Intensive Visual Question AnsweringCode1
Real-Time Machine-Learning-Based Optimization Using Input Convex Long Short-Term Memory NetworkCode1
Benchmarking PtO and PnO Methods in the Predictive Combinatorial Optimization RegimeCode1
MonoProb: Self-Supervised Monocular Depth Estimation with Interpretable UncertaintyCode1
ADaPT: As-Needed Decomposition and Planning with Language ModelsCode1
Everything of Thoughts: Defying the Law of Penrose Triangle for Thought GenerationCode1
ALYMPICS: LLM Agents Meet Game Theory -- Exploring Strategic Decision-Making with AI AgentsCode1
Cal-DETR: Calibrated Detection TransformerCode1
An algorithmic framework for synthetic cost-aware decision making in molecular designCode1
DEFN: Dual-Encoder Fourier Group Harmonics Network for Three-Dimensional Indistinct-Boundary Object SegmentationCode1
Advances in Embodied Navigation Using Large Language Models: A SurveyCode1
Interpretable Prototype-based Graph Information BottleneckCode1
Free from Bellman Completeness: Trajectory Stitching via Model-based Return-conditioned Supervised LearningCode1
Hierarchical Framework for Interpretable and Probabilistic Model-Based Safe Reinforcement LearningCode1
EHRXQA: A Multi-Modal Question Answering Dataset for Electronic Health Records with Chest X-ray ImagesCode1
Tree Prompting: Efficient Task Adaptation without Fine-TuningCode1
EconAgent: Large Language Model-Empowered Agents for Simulating Macroeconomic ActivitiesCode1
Bridging the Novice-Expert Gap via Models of Decision-Making: A Case Study on Remediating Math MistakesCode1
Can GPT-4V(ision) Serve Medical Applications? Case Studies on GPT-4V for Multimodal Medical DiagnosisCode1
On Statistical Learning of Branch and Bound for Vehicle Routing OptimizationCode1
QACHECK: A Demonstration System for Question-Guided Multi-Hop Fact-CheckingCode1
Explainable Image Similarity: Integrating Siamese Networks and Grad-CAMCode1
Linear Latent World Models in Simple Transformers: A Case Study on Othello-GPTCode1
What If the TV Was Off? Examining Counterfactual Reasoning Abilities of Multi-modal Language ModelsCode1
Explainable Claim Verification via Knowledge-Grounded Reasoning with Large Language ModelsCode1
AvalonBench: Evaluating LLMs Playing the Game of AvalonCode1
Deep Learning for Two-Stage Robust Integer OptimizationCode1
Beyond Uniform Sampling: Offline Reinforcement Learning with Imbalanced DatasetsCode1
Large Language Model Cascades with Mixture of Thoughts Representations for Cost-efficient ReasoningCode1
MetaTool Benchmark for Large Language Models: Deciding Whether to Use Tools and Which to UseCode1
Trainable Noise Model as an XAI evaluation method: application on Sobol for remote sensing image segmentationCode1
Towards Robust Fidelity for Evaluating Explainability of Graph Neural NetworksCode1
Talk2BEV: Language-enhanced Bird's-eye View Maps for Autonomous DrivingCode1
Mini-BEHAVIOR: A Procedurally Generated Benchmark for Long-horizon Decision-Making in Embodied AICode1
Show:102550
← PrevPage 10 of 247Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1SRLAAverage Remaining Cycles6.4Unverified