| MEETING DELEGATE: Benchmarking LLMs on Attending Meetings on Our Behalf | Feb 5, 2025 | BenchmarkingScheduling | —Unverified | 0 |
| Energy & Force Regression on DFT Trajectories is Not Enough for Universal Machine Learning Interatomic Potentials | Feb 5, 2025 | Benchmarking | —Unverified | 0 |
| Optimal PMU Placement for Kalman Filtering of DAE Power System Models | Feb 5, 2025 | BenchmarkingState Estimation | —Unverified | 0 |
| Benchmarking Time Series Forecasting Models: From Statistical Techniques to Foundation Models in Real-World Applications | Feb 5, 2025 | BenchmarkingFeature Engineering | —Unverified | 0 |
| xai_evals : A Framework for Evaluating Post-Hoc Local Explanation Methods | Feb 5, 2025 | Benchmarking | —Unverified | 0 |
| TGB-Seq Benchmark: Challenging Temporal GNNs with Complex Sequential Dynamics | Feb 5, 2025 | BenchmarkingLink Prediction | CodeCode Available | 0 |
| No Metric to Rule Them All: Toward Principled Evaluations of Graph-Learning Datasets | Feb 4, 2025 | AllBenchmarking | CodeCode Available | 0 |
| Evalita-LLM: Benchmarking Large Language Models on Italian | Feb 4, 2025 | BenchmarkingMultiple-choice | —Unverified | 0 |
| Generative Psycho-Lexical Approach for Constructing Value Systems in Large Language Models | Feb 4, 2025 | BenchmarkingDecision Making | —Unverified | 0 |
| A comparison of translation performance between DeepL and Supertext | Feb 4, 2025 | BenchmarkingMachine Translation | CodeCode Available | 0 |