SOTAVerified

Decision Making

Papers

Showing 901925 of 12311 papers

TitleStatusHype
Med-gte-hybrid: A contextual embedding transformer model for extracting actionable information from clinical texts0
Graph Attention Convolutional U-NET: A Semantic Segmentation Model for Identifying Flooded Areas0
Exploring Embodied Multimodal Large Models: Development, Datasets, and Future Directions0
A Knowledge Distillation-Based Approach to Enhance Transparency of Classifier ModelsCode0
Alignment, Agency and Autonomy in Frontier AI: A Systems Engineering Perspective0
PC-Agent: A Hierarchical Multi-Agent Collaboration Framework for Complex Task Automation on PCCode9
Reinforcement Learning for Ultrasound Image Analysis A Comprehensive Review of Advances and Applications0
SPRIG: Stackelberg Perception-Reinforcement Learning with Internal Game Dynamics0
An Interpretable Machine Learning Approach to Understanding the Relationships between Solar Flares and Source Active Regions0
Multi-Objective Causal Bayesian OptimizationCode1
Investigating the Impact of LLM Personality on Cognitive Bias Manifestation in Automated Decision-Making Tasks0
The Impact and Feasibility of Self-Confidence Shaping for AI-Assisted Decision-Making0
Human Misperception of Generative-AI Alignment: A Laboratory Experiment0
Mem2Ego: Empowering Vision-Language Models with Global-to-Ego Memory for Long-Horizon Embodied Navigation0
How Far are LLMs from Being Our Digital Twins? A Benchmark for Persona-Based Behavior Chain SimulationCode1
MedHallu: A Comprehensive Benchmark for Detecting Medical Hallucinations in Large Language Models0
Online detection of forecast model inadequacies using forecast errors0
STeCa: Step-level Trajectory Calibration for LLM Agent LearningCode1
Beyond Self-Talk: A Communication-Centric Survey of LLM-Based Multi-Agent Systems0
Black Sheep in the Herd: Playing with Spuriously Correlated Attributes for Vision-Language Recognition0
Human-Artificial Interaction in the Age of Agentic AI: A System-Theoretical Approach0
LLM-Enhanced Dialogue Management for Full-Duplex Spoken Dialogue Systems0
Benchmarking LLMs for Political Science: A United Nations PerspectiveCode1
Playing Hex and Counter Wargames using Reinforcement Learning and Recurrent Neural NetworksCode0
RGAR: Recurrence Generation-augmented Retrieval for Factual-aware Medical Question Answering0
Show:102550
← PrevPage 37 of 493Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1SRLAAverage Remaining Cycles6.4Unverified