| SoK: Membership Inference Attacks on LLMs are Rushing Nowhere (and How to Fix It) | Jun 25, 2024 | BenchmarkingExperimental Design | CodeCode Available | 1 | 5 |
| RGB-D Indiscernible Object Counting in Underwater Scenes | Apr 23, 2023 | BenchmarkingDepth Estimation | CodeCode Available | 1 | 5 |
| AutoDetect: Towards a Unified Framework for Automated Weakness Detection in Large Language Models | Jun 24, 2024 | BenchmarkingData Augmentation | CodeCode Available | 1 | 5 |
| DNN+NeuroSim V2.0: An End-to-End Benchmarking Framework for Compute-in-Memory Accelerators for On-chip Training | Mar 13, 2020 | BenchmarkingQuantization | CodeCode Available | 1 | 5 |
| Aquatic Navigation: A Challenging Benchmark for Deep Reinforcement Learning | May 30, 2024 | Autonomous DrivingBenchmarking | CodeCode Available | 1 | 5 |
| IndoNLU: Benchmark and Resources for Evaluating Indonesian Natural Language Understanding | Sep 11, 2020 | BenchmarkingDiversity | CodeCode Available | 1 | 5 |
| scSSL-Bench: Benchmarking Self-Supervised Learning for Single-Cell Data | Jun 10, 2025 | BenchmarkingData Augmentation | CodeCode Available | 1 | 5 |
| Benchmark on Drug Target Interaction Modeling from a Structure Perspective | Jul 4, 2024 | BenchmarkingDrug Discovery | CodeCode Available | 1 | 5 |
| Ineq-Comp: Benchmarking Human-Intuitive Compositional Reasoning in Automated Theorem Proving on Inequalities | May 19, 2025 | Automated Theorem ProvingBenchmarking | CodeCode Available | 1 | 5 |
| Initial recommendations for performing, benchmarking, and reporting single-cell proteomics experiments | Jul 19, 2022 | BenchmarkingExperimental Design | CodeCode Available | 1 | 5 |