SOTAVerified

Decision Making

Papers

Showing 151200 of 12311 papers

TitleStatusHype
Deep Generative Models for Offline Policy Learning: Tutorial, Survey, and Perspectives on Future DirectionsCode2
PCA-Bench: Evaluating Multimodal Large Language Models in Perception-Cognition-Action ChainCode2
RAG-Driver: Generalisable Driving Explanations with Retrieval-Augmented In-Context Learning in Multi-Modal Large Language ModelCode2
Jack of All Trades, Master of Some, a Multi-Purpose Transformer AgentCode2
Fairness Evaluation for Uplift Modeling in the Absence of Ground TruthCode2
AdaFlow: Imitation Learning with Variance-Adaptive Flow-Based PoliciesCode2
Position: What Can Large Language Models Tell Us about Time Series AnalysisCode2
True Knowledge Comes from Practice: Aligning LLMs with Embodied Environments via Reinforcement LearningCode2
CivRealm: A Learning and Reasoning Odyssey in Civilization for Decision-Making AgentsCode2
Graph-of-Thought: Utilizing Large Language Models to Solve Complex and Dynamic Business ProblemsCode2
ChartAssisstant: A Universal Chart Multimodal Language Model via Chart-to-Table Pre-training and Multitask Instruction TuningCode2
LLMLight: Large Language Models as Traffic Signal Control AgentsCode2
LingoQA: Visual Question Answering for Autonomous DrivingCode2
FinMem: A Performance-Enhanced LLM Trading Agent with Layered Memory and Character DesignCode2
Tactics2D: A Highly Modular and Extensible Simulator for Driving Decision-makingCode2
On the Road with GPT-4V(ision): Early Explorations of Visual-Language Model on Autonomous DrivingCode2
ProAgent: From Robotic Process Automation to Agentic Process AutomationCode2
Vision Language Models in Autonomous Driving: A Survey and OutlookCode2
Octopus: Embodied Vision-Language Programmer from Environmental FeedbackCode2
Distributional Soft Actor-Critic with Three RefinementsCode2
Language Agent Tree Search Unifies Reasoning Acting and Planning in Language ModelsCode2
MLAgentBench: Evaluating Language Agents on Machine Learning ExperimentationCode2
Driving with LLMs: Fusing Object-Level Vector Modality for Explainable Autonomous DrivingCode2
AutoDAN: Generating Stealthy Jailbreak Prompts on Aligned Large Language ModelsCode2
GPT-Driver: Learning to Drive with GPTCode2
Alphazero-like Tree-Search can Guide Large Language Model Decoding and TrainingCode2
Cross-Prediction-Powered InferenceCode2
DiLu: A Knowledge-Driven Approach to Autonomous Driving with Large Language ModelsCode2
ExpeL: LLM Agents Are Experiential LearnersCode2
MindMap: Knowledge Graph Prompting Sparks Graph of Thoughts in Large Language ModelsCode2
BOLAA: Benchmarking and Orchestrating LLM-augmented Autonomous AgentsCode2
Cumulative Reasoning with Large Language ModelsCode2
Global birdsong embeddings enable superior transfer learning for bioacoustic classificationCode2
Jumanji: a Diverse Suite of Scalable Reinforcement Learning Environments in JAXCode2
Adversarial attacks and defenses in explainable artificial intelligence: A surveyCode2
STEVE-1: A Generative Model for Text-to-Behavior in MinecraftCode2
Harnessing Explanations: LLM-to-LM Interpreter for Enhanced Text-Attributed Graph Representation LearningCode2
Training Diffusion Models with Reinforcement LearningCode2
AGIEval: A Human-Centric Benchmark for Evaluating Foundation ModelsCode2
Large AI Models in Health Informatics: Applications, Challenges, and the FutureCode2
Trieste: Efficiently Exploring The Depths of Black-box Functions with TensorFlowCode2
Grounding Large Language Models in Interactive Environments with Online Reinforcement LearningCode2
ADAPT: Action-aware Driving Caption TransformerCode2
Towards Reasoning in Large Language Models: A SurveyCode2
ACE: Cooperative Multi-agent Q-learning with Bidirectional Action-DependencyCode2
Dungeons and Data: A Large-Scale NetHack DatasetCode2
PlanT: Explainable Planning Transformers via Object-Level RepresentationsCode2
Harfang3D Dog-Fight Sandbox: A Reinforcement Learning Research Platform for the Customized Control Tasks of Fighter AircraftsCode2
HierarchicalForecast: A Reference Framework for Hierarchical Forecasting in PythonCode2
WebShop: Towards Scalable Real-World Web Interaction with Grounded Language AgentsCode2
Show:102550
← PrevPage 4 of 247Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1SRLAAverage Remaining Cycles6.4Unverified