| IntelliGraphs: Datasets for Benchmarking Knowledge Graph Generation | Jul 13, 2023 | BenchmarkingGraph Embedding | CodeCode Available | 1 |
| A Comprehensive Overview of Large Language Models | Jul 12, 2023 | Benchmarking | CodeCode Available | 1 |
| AnuraSet: A dataset for benchmarking Neotropical anuran calls identification in passive acoustic monitoring | Jul 11, 2023 | Benchmarking | CodeCode Available | 1 |
| Benchmarking Algorithms for Federated Domain Generalization | Jul 11, 2023 | BenchmarkingDiversity | CodeCode Available | 1 |
| A Call to Reflect on Evaluation Practices for Age Estimation: Comparative Analysis of the State-of-the-Art and a Unified Benchmark | Jul 10, 2023 | Age EstimationBenchmarking | CodeCode Available | 1 |
| Benchmarking Test-Time Adaptation against Distribution Shifts in Image Classification | Jul 6, 2023 | BenchmarkingDomain Adaptation | CodeCode Available | 1 |
| Uncovering the Limits of Machine Learning for Automatic Vulnerability Detection | Jun 28, 2023 | BenchmarkingData Augmentation | CodeCode Available | 1 |
| SCENEREPLICA: Benchmarking Real-World Robot Manipulation by Creating Replicable Scenes | Jun 27, 2023 | BenchmarkingMotion Planning | CodeCode Available | 1 |
| Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs | Jun 22, 2023 | Arithmetic ReasoningBenchmarking | CodeCode Available | 1 |
| VisoGender: A dataset for benchmarking gender bias in image-text pronoun resolution | Jun 21, 2023 | BenchmarkingRetrieval | CodeCode Available | 1 |