SOTAVerified

Decision Making

Papers

Showing 351400 of 12311 papers

TitleStatusHype
Pursuing Overall Welfare in Federated Learning through Sequential Decision MakingCode1
In-Context Decision Transformer: Reinforcement Learning via Hierarchical Chain-of-ThoughtCode1
G-Transformer for Conditional Average Potential Outcome Estimation over TimeCode1
LLM experiments with simulation: Large Language Model Multi-Agent System for Simulation Model Parametrization in Digital TwinsCode1
Rethinking Transformers in Solving POMDPsCode1
GTA: Generative Trajectory Augmentation with Guidance for Offline Reinforcement LearningCode1
STRIDE: A Tool-Assisted LLM Agent Framework for Strategic and Interactive Decision-MakingCode1
Efficient Recurrent Off-Policy RL Requires a Context-Encoder-Specific Learning RateCode1
Improving Single Domain-Generalized Object Detection: A Focus on Diversification and AlignmentCode1
PATE: Proximity-Aware Time series anomaly EvaluationCode1
Movie Revenue Prediction using Machine Learning ModelsCode1
SemEval-2024 Task 3: Multimodal Emotion Cause Analysis in ConversationsCode1
Conformal Alignment: Knowing When to Trust Foundation Models with GuaranteesCode1
FedGCS: A Generative Framework for Efficient Client Selection in Federated Learning via Gradient-based OptimizationCode1
Weakly-Supervised Residual Evidential Learning for Multi-Instance Uncertainty EstimationCode1
Argumentative Large Language Models for Explainable and Contestable Claim VerificationCode1
UCB-driven Utility Function Search for Multi-objective Reinforcement LearningCode1
CoSense3D: an Agent-based Efficient Learning Framework for Collective PerceptionCode1
CoCoG: Controllable Visual Stimuli Generation based on Human Concept RepresentationsCode1
Large Language Models in the Clinic: A Comprehensive BenchmarkCode1
Conformal Predictive Systems Under Covariate ShiftCode1
BattleAgent: Multi-modal Dynamic Emulation on Historical Battles to Complement Historical AnalysisCode1
Group-Aware Coordination Graph for Multi-Agent Reinforcement LearningCode1
Open-Ended Wargames with Large Language ModelsCode1
MCPNet: An Interpretable Classifier via Multi-Level Concept PrototypesCode1
LawInstruct: A Resource for Studying Language Model Adaptation to the Legal DomainCode1
OOSTraj: Out-of-Sight Trajectory Prediction With Vision-Positioning DenoisingCode1
LITE: Modeling Environmental Ecosystems with Multimodal Large Language ModelsCode1
Language Models are Spacecraft OperatorsCode1
Linguistic Calibration of Long-Form GenerationsCode1
Decision Mamba: Reinforcement Learning via Sequence Modeling with Selective State SpacesCode1
Optimization-based Prompt Injection Attack to LLM-as-a-JudgeCode1
MMIDR: Teaching Large Language Model to Interpret Multimodal Misinformation via Knowledge DistillationCode1
Towards Learning Contrast Kinetics with Multi-Condition Latent Diffusion ModelsCode1
Uncertainty quantification for data-driven weather modelsCode1
Probabilistic Calibration by Design for Neural Network RegressionCode1
LLM Guided Evolution - The Automation of Models Advancing ModelsCode1
Driving Style Alignment for LLM-powered Driver AgentCode1
Beyond Pixels: Enhancing LIME with Hierarchical Features and Segmentation Foundation ModelsCode1
Reinforced Sequential Decision-Making for Sepsis Treatment: The POSNEGDM Framework with Mortality Classifier and TransformerCode1
Tapilot-Crossing: Benchmarking and Evolving LLMs Towards Interactive Data Analysis AgentsCode1
Human vs. Machine: Behavioral Differences Between Expert Humans and Language Models in Wargame SimulationsCode1
MiKASA: Multi-Key-Anchor & Scene-Aware Transformer for 3D Visual GroundingCode1
AgentsCourt: Building Judicial Decision-Making Agents with Court Debate Simulation and Legal Knowledge AugmentationCode1
ComTraQ-MPC: Meta-Trained DQN-MPC Integration for Trajectory Tracking with Limited Active Localization UpdatesCode1
Playing NetHack with LLMs: Potential & Limitations as Zero-Shot AgentsCode1
MemoNav: Working Memory Model for Visual NavigationCode1
Large Language Models are Learnable Planners for Long-Term RecommendationCode1
Benchmarking Data Science AgentsCode1
How Can LLM Guide RL? A Value-Based ApproachCode1
Show:102550
← PrevPage 8 of 247Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1SRLAAverage Remaining Cycles6.4Unverified