| featsel: A framework for benchmarking of feature selection algorithms and cost functions | Jul 19, 2017 | BenchmarkingComputational Efficiency | CodeCode Available | 1 | 5 |
| SciFIBench: Benchmarking Large Multimodal Models for Scientific Figure Interpretation | May 14, 2024 | BenchmarkingMultiple-choice | CodeCode Available | 1 | 5 |
| FedAIoT: A Federated Learning Benchmark for Artificial Intelligence of Things | Sep 29, 2023 | BenchmarkingFederated Learning | CodeCode Available | 1 | 5 |
| FedCV: A Federated Learning Framework for Diverse Computer Vision Tasks | Nov 22, 2021 | BenchmarkingFederated Learning | CodeCode Available | 1 | 5 |
| Beyond neural scaling laws: beating power law scaling via data pruning | Jun 29, 2022 | Benchmarking | CodeCode Available | 1 | 5 |
| Beyond Normal: On the Evaluation of Mutual Information Estimators | Jun 19, 2023 | BenchmarkingDomain Generalization | CodeCode Available | 1 | 5 |
| Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs | Jun 22, 2023 | Arithmetic ReasoningBenchmarking | CodeCode Available | 1 | 5 |
| KoLA: Carefully Benchmarking World Knowledge of Large Language Models | Jun 15, 2023 | BenchmarkingHallucination | CodeCode Available | 1 | 5 |
| LagrangeBench: A Lagrangian Fluid Mechanics Benchmarking Suite | Sep 28, 2023 | Benchmarking | CodeCode Available | 1 | 5 |
| LongMamba: Enhancing Mamba's Long Context Capabilities via Training-Free Receptive Field Enlargement | Apr 22, 2025 | BenchmarkingLanguage Modeling | CodeCode Available | 1 | 5 |