| AnuraSet: A dataset for benchmarking Neotropical anuran calls identification in passive acoustic monitoring | Jul 11, 2023 | Benchmarking | CodeCode Available | 1 | 5 |
| Deep learning model solves change point detection for multiple change types | Apr 15, 2022 | BenchmarkingChange Point Detection | CodeCode Available | 1 | 5 |
| Fast hyperboloid decision tree algorithms | Oct 20, 2023 | BenchmarkingRiemannian optimization | CodeCode Available | 1 | 5 |
| FFB: A Fair Fairness Benchmark for In-Processing Group Fairness Methods | Jun 15, 2023 | BenchmarkingFairness | CodeCode Available | 1 | 5 |
| Working Memory Capacity of ChatGPT: An Empirical Study | Apr 30, 2023 | BenchmarkingLanguage Modeling | CodeCode Available | 1 | 5 |
| Benchmarking Natural Language Understanding Services for building Conversational Agents | Mar 13, 2019 | BenchmarkingGeneral Classification | CodeCode Available | 1 | 5 |
| DependEval: Benchmarking LLMs for Repository Dependency Understanding | Mar 9, 2025 | BenchmarkingCode Generation | CodeCode Available | 1 | 5 |
| DeID-GPT: Zero-shot Medical Text De-Identification by GPT-4 | Mar 20, 2023 | BenchmarkingDe-identification | CodeCode Available | 1 | 5 |
| Benchmarking Neural Network Generalization for Grammar Induction | Aug 16, 2023 | Benchmarking | CodeCode Available | 1 | 5 |
| Evaluation of large language models for discovery of gene set function | Sep 7, 2023 | BenchmarkingLanguage Modelling | CodeCode Available | 1 | 5 |