| PATH: A Discrete-sequence Dataset for Evaluating Online Unsupervised Anomaly Detection Approaches for Multivariate Time Series | Nov 21, 2024 | Anomaly DetectionBenchmarking | CodeCode Available | 0 |
| Multi-Agent Environments for Vehicle Routing Problems | Nov 21, 2024 | Benchmarkingreinforcement-learning | CodeCode Available | 1 |
| Forecasting Future International Events: A Reliable Dataset for Text-Based Event Modeling | Nov 21, 2024 | ArticlesBenchmarking | CodeCode Available | 0 |
| Beyond Visual Understanding: Introducing PARROT-360V for Vision Language Model Benchmarking | Nov 20, 2024 | BenchmarkingLanguage Modeling | —Unverified | 0 |
| Delta-Influence: Unlearning Poisons via Influence Functions | Nov 20, 2024 | AttributeBenchmarking | CodeCode Available | 0 |
| Benchmarking a wide range of optimisers for solving the Fermi-Hubbard model using the variational quantum eigensolver | Nov 20, 2024 | Benchmarking | —Unverified | 0 |
| VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models | Nov 20, 2024 | BenchmarkingImage Generation | CodeCode Available | 5 |
| BelHouse3D: A Benchmark Dataset for Assessing Occlusion Robustness in 3D Point Cloud Semantic Segmentation | Nov 20, 2024 | BenchmarkingPoint Cloud Segmentation | —Unverified | 0 |
| BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games | Nov 20, 2024 | BenchmarkingNetHack | —Unverified | 0 |
| The Moral Mind(s) of Large Language Models | Nov 19, 2024 | BenchmarkingDecision Making | —Unverified | 0 |