| Risk Aware Benchmarking of Large Language Models | Oct 11, 2023 | BenchmarkingEconometrics | —Unverified | 0 |
| Psychoacoustic Challenges Of Speech Enhancement On VoIP Platforms | Oct 11, 2023 | BenchmarkingDenoising | —Unverified | 0 |
| ProbTS: Benchmarking Point and Distributional Forecasting across Diverse Prediction Horizons | Oct 11, 2023 | BenchmarkingPosition | CodeCode Available | 2 |
| BeSt-LeS: Benchmarking Stroke Lesion Segmentation using Deep Supervision | Oct 10, 2023 | Acute Stroke Lesion SegmentationBenchmarking | CodeCode Available | 0 |
| CAFA-evaluator: A Python Tool for Benchmarking Ontological Classification Methods | Oct 10, 2023 | BenchmarkingPrediction | —Unverified | 0 |
| What If the TV Was Off? Examining Counterfactual Reasoning Abilities of Multi-modal Language Models | Oct 10, 2023 | BenchmarkingCode Generation | CodeCode Available | 1 |
| Benchmarking and Explaining Large Language Model-based Code Generation: A Causality-Centric Approach | Oct 10, 2023 | BenchmarkingCode Generation | CodeCode Available | 1 |
| On the Evaluation and Refinement of Vision-Language Instruction Tuning Datasets | Oct 10, 2023 | AllBenchmarking | —Unverified | 0 |
| Distributed Evolution Strategies with Multi-Level Learning for Large-Scale Black-Box Optimization | Oct 9, 2023 | Benchmarking | —Unverified | 0 |
| Exploring Progress in Multivariate Time Series Forecasting: Comprehensive Benchmarking and Heterogeneity Analysis | Oct 9, 2023 | BenchmarkingMultivariate Time Series Forecasting | CodeCode Available | 3 |