| Uncertainty estimation of machine learning spatial precipitation predictions from satellite data | Nov 13, 2023 | BenchmarkingFeature Importance | —Unverified | 0 |
| The Disagreement Problem in Faithfulness Metrics | Nov 13, 2023 | BenchmarkingExplainable artificial intelligence | —Unverified | 0 |
| WaterBench: Towards Holistic Evaluation of Watermarks for Large Language Models | Nov 13, 2023 | BenchmarkingInstruction Following | CodeCode Available | 1 |
| Flames: Benchmarking Value Alignment of LLMs in Chinese | Nov 12, 2023 | BenchmarkingFairness | CodeCode Available | 1 |
| Identification of vortex in unstructured mesh with graph neural networks | Nov 11, 2023 | BenchmarkingGraph Generation | —Unverified | 0 |
| CloudEval-YAML: A Practical Benchmark for Cloud Configuration Generation | Nov 10, 2023 | BenchmarkingCloud Computing | CodeCode Available | 1 |
| MultiIoT: Benchmarking Machine Learning for the Internet of Things | Nov 10, 2023 | BenchmarkingRepresentation Learning | CodeCode Available | 1 |
| SeaTurtleID2022: A long-span dataset for reliable sea turtle re-identification | Nov 9, 2023 | BenchmarkingInstance Segmentation | —Unverified | 0 |
| TencentLLMEval: A Hierarchical Evaluation of Real-World Capabilities for Human-Aligned LLMs | Nov 9, 2023 | BenchmarkingQuestion Answering | CodeCode Available | 1 |
| An efficiency analysis of Spanish airports | Nov 8, 2023 | Benchmarking | —Unverified | 0 |