| Data-driven Power Flow Linearization: Simulation | Jun 10, 2024 | BenchmarkingComputational Efficiency | —Unverified | 0 |
| Improving Generalization of Neural Vehicle Routing Problem Solvers Through the Lens of Model Architecture | Jun 10, 2024 | BenchmarkingDecoder | CodeCode Available | 0 |
| INTERSPEECH 2009 Emotion Challenge Revisited: Benchmarking 15 Years of Progress in Speech Emotion Recognition | Jun 10, 2024 | BenchmarkingEmotion Recognition | CodeCode Available | 0 |
| DISCOVERYWORLD: A Virtual Environment for Developing and Evaluating Automated Scientific Discovery Agents | Jun 10, 2024 | Benchmarkingscientific discovery | CodeCode Available | 3 |
| Can Language Models Serve as Text-Based World Simulators? | Jun 10, 2024 | BenchmarkingDecision Making | —Unverified | 0 |
| Multivariate Stochastic Dominance via Optimal Transport and Applications to Models Benchmarking | Jun 10, 2024 | BenchmarkingEconometrics | —Unverified | 0 |
| TopoBench: A Framework for Benchmarking Topological Deep Learning | Jun 9, 2024 | BenchmarkingDeep Learning | CodeCode Available | 3 |
| Smiles2Dock: an open large-scale multi-task dataset for ML-based molecular docking | Jun 9, 2024 | BenchmarkingDrug Discovery | CodeCode Available | 1 |
| QGEval: Benchmarking Multi-dimensional Evaluation for Question Generation | Jun 9, 2024 | BenchmarkingQuestion Generation | CodeCode Available | 1 |
| ICU-Sepsis: A Benchmark MDP Built from Real Medical Data | Jun 9, 2024 | BenchmarkingManagement | CodeCode Available | 1 |