SOTAVerified

Decision Making

Papers

Showing 391400 of 12311 papers

TitleStatusHype
Tapilot-Crossing: Benchmarking and Evolving LLMs Towards Interactive Data Analysis AgentsCode1
Human vs. Machine: Behavioral Differences Between Expert Humans and Language Models in Wargame SimulationsCode1
MiKASA: Multi-Key-Anchor & Scene-Aware Transformer for 3D Visual GroundingCode1
AgentsCourt: Building Judicial Decision-Making Agents with Court Debate Simulation and Legal Knowledge AugmentationCode1
ComTraQ-MPC: Meta-Trained DQN-MPC Integration for Trajectory Tracking with Limited Active Localization UpdatesCode1
Playing NetHack with LLMs: Potential & Limitations as Zero-Shot AgentsCode1
MemoNav: Working Memory Model for Visual NavigationCode1
Large Language Models are Learnable Planners for Long-Term RecommendationCode1
Benchmarking Data Science AgentsCode1
How Can LLM Guide RL? A Value-Based ApproachCode1
Show:102550
← PrevPage 40 of 1232Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1SRLAAverage Remaining Cycles6.4Unverified