| IntelliGraphs: Datasets for Benchmarking Knowledge Graph Generation | Jul 13, 2023 | BenchmarkingGraph Embedding | CodeCode Available | 1 |
| A Comprehensive Overview of Large Language Models | Jul 12, 2023 | Benchmarking | CodeCode Available | 1 |
| AnuraSet: A dataset for benchmarking Neotropical anuran calls identification in passive acoustic monitoring | Jul 11, 2023 | Benchmarking | CodeCode Available | 1 |
| Benchmarking Algorithms for Federated Domain Generalization | Jul 11, 2023 | BenchmarkingDiversity | CodeCode Available | 1 |
| A Call to Reflect on Evaluation Practices for Age Estimation: Comparative Analysis of the State-of-the-Art and a Unified Benchmark | Jul 10, 2023 | Age EstimationBenchmarking | CodeCode Available | 1 |
| Benchmarking Test-Time Adaptation against Distribution Shifts in Image Classification | Jul 6, 2023 | BenchmarkingDomain Adaptation | CodeCode Available | 1 |
| Uncovering the Limits of Machine Learning for Automatic Vulnerability Detection | Jun 28, 2023 | BenchmarkingData Augmentation | CodeCode Available | 1 |
| SCENEREPLICA: Benchmarking Real-World Robot Manipulation by Creating Replicable Scenes | Jun 27, 2023 | BenchmarkingMotion Planning | CodeCode Available | 1 |
| Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs | Jun 22, 2023 | Arithmetic ReasoningBenchmarking | CodeCode Available | 1 |
| Challenges and Opportunities in Improving Worst-Group Generalization in Presence of Spurious Features | Jun 21, 2023 | BenchmarkingModel Selection | CodeCode Available | 1 |
| Benchmarking and Analyzing 3D-aware Image Synthesis with a Modularized Codebase | Jun 21, 2023 | 3D-Aware Image SynthesisBenchmarking | CodeCode Available | 1 |
| GADBench: Revisiting and Benchmarking Supervised Graph Anomaly Detection | Jun 21, 2023 | Anomaly DetectionBenchmarking | CodeCode Available | 1 |
| VisoGender: A dataset for benchmarking gender bias in image-text pronoun resolution | Jun 21, 2023 | BenchmarkingRetrieval | CodeCode Available | 1 |
| IMP-MARL: a Suite of Environments for Large-scale Infrastructure Management Planning via MARL | Jun 20, 2023 | BenchmarkingManagement | CodeCode Available | 1 |
| Geometric Deep Learning for Structure-Based Drug Design: A Survey | Jun 20, 2023 | BenchmarkingDeep Learning | CodeCode Available | 1 |
| causalAssembly: Generating Realistic Production Data for Benchmarking Causal Discovery | Jun 19, 2023 | BenchmarkingCausal Discovery | CodeCode Available | 1 |
| Beyond Normal: On the Evaluation of Mutual Information Estimators | Jun 19, 2023 | BenchmarkingDomain Generalization | CodeCode Available | 1 |
| CompanyKG: A Large-Scale Heterogeneous Graph for Company Similarity Quantification | Jun 18, 2023 | BenchmarkingRetrieval | CodeCode Available | 1 |
| Evaluating Graph Neural Networks for Link Prediction: Current Pitfalls and New Benchmarking | Jun 18, 2023 | BenchmarkingLink Prediction | CodeCode Available | 1 |
| OpenDataVal: a Unified Benchmark for Data Valuation | Jun 18, 2023 | BenchmarkingData Valuation | CodeCode Available | 1 |
| LabelBench: A Comprehensive Framework for Benchmarking Adaptive Label-Efficient Learning | Jun 16, 2023 | Active LearningBenchmarking | CodeCode Available | 1 |
| Are Large Language Models Really Good Logical Reasoners? A Comprehensive Evaluation and Beyond | Jun 16, 2023 | BenchmarkingEvidence Selection | CodeCode Available | 1 |
| Towards Motion Forecasting with Real-World Perception Inputs: Are End-to-End Approaches Competitive? | Jun 15, 2023 | Autonomous DrivingAutonomous Vehicles | CodeCode Available | 1 |
| FFB: A Fair Fairness Benchmark for In-Processing Group Fairness Methods | Jun 15, 2023 | BenchmarkingFairness | CodeCode Available | 1 |
| MLonMCU: TinyML Benchmarking with Fast Retargeting | Jun 15, 2023 | Benchmarking | CodeCode Available | 1 |
| Symmetry-Informed Geometric Representation for Molecules, Proteins, and Crystalline Materials | Jun 15, 2023 | BenchmarkingComputational chemistry | CodeCode Available | 1 |
| PaReprop: Fast Parallelized Reversible Backpropagation | Jun 15, 2023 | Benchmarking | CodeCode Available | 1 |
| KoLA: Carefully Benchmarking World Knowledge of Large Language Models | Jun 15, 2023 | BenchmarkingHallucination | CodeCode Available | 1 |
| Towards Benchmarking and Improving the Temporal Reasoning Capability of Large Language Models | Jun 15, 2023 | BenchmarkingQuestion Answering | CodeCode Available | 1 |
| AQuA: A Benchmarking Tool for Label Quality Assessment | Jun 15, 2023 | BenchmarkingLabel Error Detection | CodeCode Available | 1 |
| NeuroGraph: Benchmarks for Graph Machine Learning in Brain Connectomics | Jun 9, 2023 | BenchmarkingDataset Generation | CodeCode Available | 1 |
| Yet Another ICU Benchmark: A Flexible Multi-Center Framework for Clinical ML | Jun 8, 2023 | BenchmarkingKidney Function | CodeCode Available | 1 |
| On the Detectability of ChatGPT Content: Benchmarking, Methodology, and Evaluation through the Lens of Academic Writing | Jun 7, 2023 | BenchmarkingPrompt Engineering | CodeCode Available | 1 |
| RepoBench: Benchmarking Repository-Level Code Auto-Completion Systems | Jun 5, 2023 | BenchmarkingC++ code | CodeCode Available | 1 |
| Benchmarking Large Language Models on CMExam -- A Comprehensive Chinese Medical Exam Dataset | Jun 5, 2023 | BenchmarkingMultiple-choice | CodeCode Available | 1 |
| Str2Str: A Score-based Framework for Zero-shot Protein Conformation Sampling | Jun 5, 2023 | BenchmarkingDenoising | CodeCode Available | 1 |
| TransDocAnalyser: A Framework for Offline Semi-structured Handwritten Document Analysis in the Legal Domain | Jun 3, 2023 | BenchmarkingDecoder | CodeCode Available | 1 |
| Spatially Resolved Gene Expression Prediction from H&E Histology Images via Bi-modal Contrastive Learning | Jun 2, 2023 | BenchmarkingContrastive Learning | CodeCode Available | 1 |
| BabySLM: language-acquisition-friendly benchmark of self-supervised spoken language models | Jun 2, 2023 | BenchmarkingLanguage Acquisition | CodeCode Available | 1 |
| Multilingual Conceptual Coverage in Text-to-Image Models | Jun 2, 2023 | Benchmarking | CodeCode Available | 1 |
| Improving and Benchmarking Offline Reinforcement Learning Algorithms | Jun 1, 2023 | AttributeBenchmarking | CodeCode Available | 1 |
| End-to-end Knowledge Retrieval with Multi-modal Queries | Jun 1, 2023 | BenchmarkingCross-Modal Retrieval | CodeCode Available | 1 |
| Accurate and Efficient Structural Ensemble Generation of Macrocyclic Peptides using Internal Coordinate Diffusion | May 30, 2023 | BenchmarkingDiversity | CodeCode Available | 1 |
| IDToolkit: A Toolkit for Benchmarking and Developing Inverse Design Algorithms in Nanophotonics | May 30, 2023 | Benchmarking | CodeCode Available | 1 |
| SheetCopilot: Bringing Software Productivity to the Next Level through Large Language Models | May 30, 2023 | BenchmarkingCode Generation | CodeCode Available | 1 |
| Decoding the Underlying Meaning of Multimodal Hateful Memes | May 28, 2023 | BenchmarkingHateful Meme Classification | CodeCode Available | 1 |
| Zero is Not Hero Yet: Benchmarking Zero-Shot Performance of LLMs for Financial Tasks | May 26, 2023 | Benchmarking | CodeCode Available | 1 |
| KeyPosS: Plug-and-Play Facial Landmark Detection through GPS-Inspired True-Range Multilateration | May 25, 2023 | BenchmarkingFace Recognition | CodeCode Available | 1 |
| ReadMe++: Benchmarking Multilingual Language Models for Multi-Domain Readability Assessment | May 23, 2023 | BenchmarkingCross-Lingual Transfer | CodeCode Available | 1 |
| Exploring Large Language Models for Classical Philology | May 23, 2023 | BenchmarkingDecoder | CodeCode Available | 1 |