SOTAVerified

Decision Making

Papers

Showing 176200 of 12311 papers

TitleStatusHype
Alphazero-like Tree-Search can Guide Large Language Model Decoding and TrainingCode2
Cross-Prediction-Powered InferenceCode2
DiLu: A Knowledge-Driven Approach to Autonomous Driving with Large Language ModelsCode2
ExpeL: LLM Agents Are Experiential LearnersCode2
MindMap: Knowledge Graph Prompting Sparks Graph of Thoughts in Large Language ModelsCode2
BOLAA: Benchmarking and Orchestrating LLM-augmented Autonomous AgentsCode2
Cumulative Reasoning with Large Language ModelsCode2
Global birdsong embeddings enable superior transfer learning for bioacoustic classificationCode2
Jumanji: a Diverse Suite of Scalable Reinforcement Learning Environments in JAXCode2
Adversarial attacks and defenses in explainable artificial intelligence: A surveyCode2
STEVE-1: A Generative Model for Text-to-Behavior in MinecraftCode2
Harnessing Explanations: LLM-to-LM Interpreter for Enhanced Text-Attributed Graph Representation LearningCode2
Training Diffusion Models with Reinforcement LearningCode2
AGIEval: A Human-Centric Benchmark for Evaluating Foundation ModelsCode2
Large AI Models in Health Informatics: Applications, Challenges, and the FutureCode2
Trieste: Efficiently Exploring The Depths of Black-box Functions with TensorFlowCode2
Grounding Large Language Models in Interactive Environments with Online Reinforcement LearningCode2
ADAPT: Action-aware Driving Caption TransformerCode2
Towards Reasoning in Large Language Models: A SurveyCode2
ACE: Cooperative Multi-agent Q-learning with Bidirectional Action-DependencyCode2
Dungeons and Data: A Large-Scale NetHack DatasetCode2
PlanT: Explainable Planning Transformers via Object-Level RepresentationsCode2
Harfang3D Dog-Fight Sandbox: A Reinforcement Learning Research Platform for the Customized Control Tasks of Fighter AircraftsCode2
HierarchicalForecast: A Reference Framework for Hierarchical Forecasting in PythonCode2
WebShop: Towards Scalable Real-World Web Interaction with Grounded Language AgentsCode2
Show:102550
← PrevPage 8 of 493Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1SRLAAverage Remaining Cycles6.4Unverified