| Extended Labeled Faces in-the-Wild (ELFW): Augmenting Classes for Face Segmentation | Jun 24, 2020 | BenchmarkingData Augmentation | —Unverified | 0 |
| General Scales Unlock AI Evaluation with Explanatory and Predictive Power | Mar 9, 2025 | BenchmarkingSpecificity | —Unverified | 0 |
| Benchmarking Practices in LLM-driven Offensive Security: Testbeds, Metrics, and Experiment Design | Apr 14, 2025 | BenchmarkingLanguage Modeling | —Unverified | 0 |
| Generating Artificial Outliers in the Absence of Genuine Ones -- a Survey | Jun 5, 2020 | BenchmarkingExperimental Design | —Unverified | 0 |
| A Survey of Parameters Associated with the Quality of Benchmarks in NLP | Oct 14, 2022 | Benchmarking | —Unverified | 0 |
| Generating Diverse Synthetic Datasets for Evaluation of Real-life Recommender Systems | Nov 27, 2024 | AutoMLBenchmarking | —Unverified | 0 |
| Exposing the Achilles' Heel: Evaluating LLMs Ability to Handle Mistakes in Mathematical Reasoning | Jun 16, 2024 | BenchmarkingMath | —Unverified | 0 |
| Generating Synthetic Electronic Health Record (EHR) Data: A Review with Benchmarking | Nov 6, 2024 | Benchmarking | —Unverified | 0 |
| Generation of Large District Heating System Models Using Open-Source Data and Tools: An Exemplary Workflow | Dec 18, 2024 | Benchmarking | —Unverified | 0 |
| Benchmarking Post-Hoc Unknown-Category Detection in Food Recognition | Mar 24, 2025 | BenchmarkingFood Recognition | —Unverified | 0 |