SOTAVerified

Decision Making

Papers

Showing 451475 of 12311 papers

TitleStatusHype
Can language agents be alternatives to PPO? A Preliminary Empirical Study On OpenAI GymCode1
Reason2Drive: Towards Interpretable and Chain-based Reasoning for Autonomous DrivingCode1
RiskBench: A Scenario-based Benchmark for Risk IdentificationCode1
MEDPSeg: Hierarchical polymorphic multitask learning for the segmentation of ground-glass opacities, consolidation, and pulmonary structures on computed tomographyCode1
Understanding the (Extra-)Ordinary: Validating Deep Model Decisions with Prototypical Concept-based ExplanationsCode1
Utilizing Explainability Techniques for Reinforcement Learning Model AssuranceCode1
Video-Bench: A Comprehensive Benchmark and Toolkit for Evaluating Video-based Large Language ModelsCode1
VSViG: Real-time Video-based Seizure Detection via Skeleton-based Spatiotemporal ViGCode1
Large Language Model as a Policy Teacher for Training Reinforcement Learning AgentsCode1
Physical Reasoning and Object Planning for Household Embodied AgentsCode1
Labeling Neural Representations with Inverse RecognitionCode1
From Classification to Clinical Insights: Towards Analyzing and Reasoning About Mobile and Behavioral Health Data With Large Language ModelsCode1
DocLens: Multi-aspect Fine-grained Evaluation for Medical Text GenerationCode1
Inherently Interpretable Time Series Classification via Multiple Instance LearningCode1
ToolTalk: Evaluating Tool-Usage in a Conversational SettingCode1
XplainLLM: A Knowledge-Augmented Dataset for Reliable Grounded Explanations in LLMsCode1
A Comprehensive Evaluation of GPT-4V on Knowledge-Intensive Visual Question AnsweringCode1
Real-Time Machine-Learning-Based Optimization Using Input Convex Long Short-Term Memory NetworkCode1
Benchmarking PtO and PnO Methods in the Predictive Combinatorial Optimization RegimeCode1
MonoProb: Self-Supervised Monocular Depth Estimation with Interpretable UncertaintyCode1
ADaPT: As-Needed Decomposition and Planning with Language ModelsCode1
Everything of Thoughts: Defying the Law of Penrose Triangle for Thought GenerationCode1
ALYMPICS: LLM Agents Meet Game Theory -- Exploring Strategic Decision-Making with AI AgentsCode1
Cal-DETR: Calibrated Detection TransformerCode1
An algorithmic framework for synthetic cost-aware decision making in molecular designCode1
Show:102550
← PrevPage 19 of 493Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1SRLAAverage Remaining Cycles6.4Unverified