| CLASH: Evaluating Language Models on Judging High-Stakes Dilemmas from Multiple Perspectives | Apr 15, 2025 | Benchmarking | —Unverified | 0 |
| A deep convolutional neural network model for rapid prediction of fluvial flood inundation | Jun 20, 2020 | BenchmarkingComputational Efficiency | —Unverified | 0 |
| Benchmarking Critical Questions Generation: A Challenging Reasoning Task for Large Language Models | May 16, 2025 | Benchmarking | —Unverified | 0 |
| DHP Benchmark: Are LLMs Good NLG Evaluators? | Aug 25, 2024 | Benchmarkingnlg evaluation | —Unverified | 0 |
| DiffBody: Human Body Restoration by Imagining with Generative Diffusion Prior | Apr 4, 2024 | BenchmarkingImage Restoration | —Unverified | 0 |
| CLAMS: A Cluster Ambiguity Measure for Estimating Perceptual Variability in Visual Clustering | Aug 1, 2023 | BenchmarkingClustering | —Unverified | 0 |
| A biologically-inspired multi-modal evaluation of molecular generative machine learning | Aug 20, 2022 | BenchmarkingDrug Discovery | —Unverified | 0 |
| detrex: Benchmarking Detection Transformers | Jun 12, 2023 | Benchmarkingobject-detection | —Unverified | 0 |
| Benchmarking a wide range of optimisers for solving the Fermi-Hubbard model using the variational quantum eigensolver | Nov 20, 2024 | Benchmarking | —Unverified | 0 |
| Determinants of Performance in European ATM -- How to Analyze a Diverse Industry | Feb 20, 2023 | BenchmarkingManagement | —Unverified | 0 |