| Benchmarking the Myopic Trap: Positional Bias in Information Retrieval | May 20, 2025 | BenchmarkingInformation Retrieval | CodeCode Available | 5 |
| NOVA: A Benchmark for Anomaly Localization and Clinical Reasoning in Brain MRI | May 20, 2025 | Anomaly LocalizationBenchmarking | —Unverified | 0 |
| Benchmarking data encoding methods in Quantum Machine Learning | May 20, 2025 | BenchmarkingQuantum Machine Learning | —Unverified | 0 |
| SlangDIT: Benchmarking LLMs in Interpretative Slang Translation | May 20, 2025 | BenchmarkingSentence | —Unverified | 0 |
| SATBench: Benchmarking LLMs' Logical Reasoning via Automated Puzzle Generation from SAT Formulas | May 20, 2025 | BenchmarkingLogical Reasoning | —Unverified | 0 |
| NavBench: A Unified Robotics Benchmark for Reinforcement Learning-Based Autonomous Navigation | May 20, 2025 | Autonomous NavigationBenchmarking | —Unverified | 0 |
| SzCORE as a benchmark: report from the seizure detection challenge at the 2025 AI in Epilepsy and Neurological Disorders Conference | May 19, 2025 | BenchmarkingEEG | —Unverified | 0 |
| HR-VILAGE-3K3M: A Human Respiratory Viral Immunization Longitudinal Gene Expression Dataset for Systems Immunity | May 19, 2025 | Benchmarkingfeature selection | CodeCode Available | 0 |
| Benchmarking MOEAs for solving continuous multi-objective RL problems | May 19, 2025 | BenchmarkingEvolutionary Algorithms | CodeCode Available | 0 |
| Ice Cream Doesn't Cause Drowning: Benchmarking LLMs Against Statistical Pitfalls in Causal Inference | May 19, 2025 | BenchmarkingCausal Inference | —Unverified | 0 |