SOTAVerified

Decision Making

Papers

Showing 150 of 12311 papers

TitleStatusHype
PC-Agent: A Hierarchical Multi-Agent Collaboration Framework for Complex Task Automation on PCCode9
Enhancing Investment Analysis: Optimizing AI-Agent Collaboration in Financial ResearchCode9
Diffusion Forcing: Next-token Prediction Meets Full-Sequence DiffusionCode9
FinRobot: An Open-Source AI Agent Platform for Financial Applications using Large Language ModelsCode9
RAGEN: Understanding Self-Evolution in LLM Agents via Multi-Turn Reinforcement LearningCode7
Better than classical? The subtle art of benchmarking quantum machine learning modelsCode7
Enhancing Financial Sentiment Analysis via Retrieval Augmented Large Language ModelsCode6
TxAgent: An AI Agent for Therapeutic Reasoning Across a Universe of ToolsCode5
Neural Fields in Robotics: A SurveyCode5
Maia-2: A Unified Model for Human-AI Alignment in ChessCode5
Multi-Agent Reinforcement Learning for Autonomous Driving: A SurveyCode5
Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8BCode5
LM Transparency Tool: Interactive Tool for Analyzing Transformer Language ModelsCode5
Differentiable Tree Search NetworkCode5
Large Language Model based Multi-Agents: A Survey of Progress and ChallengesCode5
GenCast: Diffusion-based ensemble forecasting for medium-range weatherCode5
Tree of Thoughts: Deliberate Problem Solving with Large Language ModelsCode5
GraphCast: Learning skillful medium-range global weather forecastingCode5
Deep Lake: a Lakehouse for Deep LearningCode5
OpenDriveVLA: Towards End-to-end Autonomous Driving with Large Vision Language Action ModelCode4
Fin-R1: A Large Language Model for Financial Reasoning through Reinforcement LearningCode4
Relationships are Complicated! An Analysis of Relationships Between Datasets on the WebCode4
Agent Q: Advanced Reasoning and Learning for Autonomous AI AgentsCode4
Is Sora a World Simulator? A Comprehensive Survey on General World Models and BeyondCode4
OmniDrive: A Holistic Vision-Language Dataset for Autonomous Driving with Counterfactual ReasoningCode4
AutoWebGLM: A Large Language Model-based Web Navigating AgentCode4
A Survey on Large Language Model-Based Game AgentsCode4
Eureka: Human-Level Reward Design via Coding Large Language ModelsCode4
Cognitive Architectures for Language AgentsCode4
AgentBench: Evaluating LLMs as AgentsCode4
TorchRL: A data-driven decision-making library for PyTorchCode4
pgmpy: A Python Toolkit for Bayesian NetworksCode4
Reflexion: Language Agents with Verbal Reinforcement LearningCode4
Mastering Diverse Domains through World ModelsCode4
Constitutional AI: Harmlessness from AI FeedbackCode4
ReAct: Synergizing Reasoning and Acting in Language ModelsCode4
A Smart Multimodal Healthcare Copilot with Powerful LLM ReasoningCode3
FlashDepth: Real-time Streaming Video Depth Estimation at 2K ResolutionCode3
Playing Non-Embedded Card-Based Games with Reinforcement LearningCode3
Beyond Quacking: Deep Integration of Language Models and RAG into DuckDBCode3
Will LLMs be Professional at Fund Investment? DeepFund: A Live Arena PerspectiveCode3
A Survey on the Optimization of Large Language Model-based AgentsCode3
Parallelized Planning-Acting for Efficient LLM-based Multi-Agent SystemsCode3
Automated Hypothesis Validation with Agentic Sequential FalsificationsCode3
Rethinking Early Stopping: Refine, Then CalibrateCode3
MineStudio: A Streamlined Package for Minecraft AI Agent DevelopmentCode3
Embodied CoT Distillation From LLM To Off-the-shelf AgentsCode3
AuctionNet: A Novel Benchmark for Decision-Making in Large-Scale GamesCode3
Auto-RAG: Autonomous Retrieval-Augmented Generation for Large Language ModelsCode3
Game-theoretic LLM: Agent Workflow for Negotiation GamesCode3
Show:102550
← PrevPage 1 of 247Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1SRLAAverage Remaining Cycles6.4Unverified