| IntelliGraphs: Datasets for Benchmarking Knowledge Graph Generation | Jul 13, 2023 | BenchmarkingGraph Embedding | CodeCode Available | 1 |
| A Comprehensive Overview of Large Language Models | Jul 12, 2023 | Benchmarking | CodeCode Available | 1 |
| AnuraSet: A dataset for benchmarking Neotropical anuran calls identification in passive acoustic monitoring | Jul 11, 2023 | Benchmarking | CodeCode Available | 1 |
| Benchmarking Algorithms for Federated Domain Generalization | Jul 11, 2023 | BenchmarkingDiversity | CodeCode Available | 1 |
| A Call to Reflect on Evaluation Practices for Age Estimation: Comparative Analysis of the State-of-the-Art and a Unified Benchmark | Jul 10, 2023 | Age EstimationBenchmarking | CodeCode Available | 1 |
| Benchmarking Test-Time Adaptation against Distribution Shifts in Image Classification | Jul 6, 2023 | BenchmarkingDomain Adaptation | CodeCode Available | 1 |
| Uncovering the Limits of Machine Learning for Automatic Vulnerability Detection | Jun 28, 2023 | BenchmarkingData Augmentation | CodeCode Available | 1 |
| SCENEREPLICA: Benchmarking Real-World Robot Manipulation by Creating Replicable Scenes | Jun 27, 2023 | BenchmarkingMotion Planning | CodeCode Available | 1 |
| Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs | Jun 22, 2023 | Arithmetic ReasoningBenchmarking | CodeCode Available | 1 |
| VisoGender: A dataset for benchmarking gender bias in image-text pronoun resolution | Jun 21, 2023 | BenchmarkingRetrieval | CodeCode Available | 1 |
| GADBench: Revisiting and Benchmarking Supervised Graph Anomaly Detection | Jun 21, 2023 | Anomaly DetectionBenchmarking | CodeCode Available | 1 |
| Challenges and Opportunities in Improving Worst-Group Generalization in Presence of Spurious Features | Jun 21, 2023 | BenchmarkingModel Selection | CodeCode Available | 1 |
| Benchmarking and Analyzing 3D-aware Image Synthesis with a Modularized Codebase | Jun 21, 2023 | 3D-Aware Image SynthesisBenchmarking | CodeCode Available | 1 |
| IMP-MARL: a Suite of Environments for Large-scale Infrastructure Management Planning via MARL | Jun 20, 2023 | BenchmarkingManagement | CodeCode Available | 1 |
| Geometric Deep Learning for Structure-Based Drug Design: A Survey | Jun 20, 2023 | BenchmarkingDeep Learning | CodeCode Available | 1 |
| Beyond Normal: On the Evaluation of Mutual Information Estimators | Jun 19, 2023 | BenchmarkingDomain Generalization | CodeCode Available | 1 |
| causalAssembly: Generating Realistic Production Data for Benchmarking Causal Discovery | Jun 19, 2023 | BenchmarkingCausal Discovery | CodeCode Available | 1 |
| Evaluating Graph Neural Networks for Link Prediction: Current Pitfalls and New Benchmarking | Jun 18, 2023 | BenchmarkingLink Prediction | CodeCode Available | 1 |
| CompanyKG: A Large-Scale Heterogeneous Graph for Company Similarity Quantification | Jun 18, 2023 | BenchmarkingRetrieval | CodeCode Available | 1 |
| OpenDataVal: a Unified Benchmark for Data Valuation | Jun 18, 2023 | BenchmarkingData Valuation | CodeCode Available | 1 |
| LabelBench: A Comprehensive Framework for Benchmarking Adaptive Label-Efficient Learning | Jun 16, 2023 | Active LearningBenchmarking | CodeCode Available | 1 |
| Are Large Language Models Really Good Logical Reasoners? A Comprehensive Evaluation and Beyond | Jun 16, 2023 | BenchmarkingEvidence Selection | CodeCode Available | 1 |
| FFB: A Fair Fairness Benchmark for In-Processing Group Fairness Methods | Jun 15, 2023 | BenchmarkingFairness | CodeCode Available | 1 |
| Symmetry-Informed Geometric Representation for Molecules, Proteins, and Crystalline Materials | Jun 15, 2023 | BenchmarkingComputational chemistry | CodeCode Available | 1 |
| KoLA: Carefully Benchmarking World Knowledge of Large Language Models | Jun 15, 2023 | BenchmarkingHallucination | CodeCode Available | 1 |