| CompanyKG: A Large-Scale Heterogeneous Graph for Company Similarity Quantification | Jun 18, 2023 | BenchmarkingRetrieval | CodeCode Available | 1 | 5 |
| Benchmarking Low-Shot Robustness to Natural Distribution Shifts | Apr 21, 2023 | Benchmarking | CodeCode Available | 1 | 5 |
| RADAR: Benchmarking Language Models on Imperfect Tabular Data | Jun 9, 2025 | BenchmarkingMissing Values | CodeCode Available | 1 | 5 |
| Benchmarking Meaning Representations in Neural Semantic Parsing | Nov 1, 2020 | BenchmarkingSemantic Parsing | CodeCode Available | 1 | 5 |
| CommonPower: A Framework for Safe Data-Driven Smart Grid Control | Jun 5, 2024 | Benchmarkingenergy management | CodeCode Available | 1 | 5 |
| Application-Oriented Benchmarking of Quantum Generative Learning Using QUARK | Aug 8, 2023 | BenchmarkingGPU | CodeCode Available | 1 | 5 |
| Benchmarking Multi-Agent Deep Reinforcement Learning Algorithms in Cooperative Tasks | Jun 14, 2020 | BenchmarkingDeep Reinforcement Learning | CodeCode Available | 1 | 5 |
| Comics Datasets Framework: Mix of Comics datasets for detection benchmarking | Jul 3, 2024 | BenchmarkingObject | CodeCode Available | 1 | 5 |
| Benchmarking End-to-End Behavioural Cloning on Video Games | Apr 2, 2020 | Behavioural cloningBenchmarking | CodeCode Available | 1 | 5 |
| DIG In: Evaluating Disparities in Image Generations with Indicators for Geographic Diversity | Aug 11, 2023 | BenchmarkingDiversity | CodeCode Available | 1 | 5 |
| Benchmarking Embedding Aggregation Methods in Computational Pathology: A Clinical Data Perspective | Jul 10, 2024 | BenchmarkingDiagnostic | CodeCode Available | 1 | 5 |
| Benchmarking Geospatial Question Answering Engines using the Dataset GeoQuestions1089 | Nov 6, 2023 | BenchmarkingKnowledge Base Question Answering | CodeCode Available | 1 | 5 |
| An Evaluation Dataset for Intent Classification and Out-of-Scope Prediction | Sep 4, 2019 | BenchmarkingGeneral Classification | CodeCode Available | 1 | 5 |
| Benchmarking Encoder-Decoder Architectures for Biplanar X-ray to 3D Shape Reconstruction | Sep 24, 2023 | 3D Shape ReconstructionAnatomy | CodeCode Available | 1 | 5 |
| Combinatorial Optimization with Policy Adaptation using Latent Space Search | Nov 13, 2023 | BenchmarkingCombinatorial Optimization | CodeCode Available | 1 | 5 |
| ComplexBench-Edit: Benchmarking Complex Instruction-Driven Image Editing via Compositional Dependencies | Jun 15, 2025 | Benchmarking | CodeCode Available | 1 | 5 |
| CosPGD: an efficient white-box adversarial attack for pixel-wise prediction tasks | Feb 4, 2023 | Adversarial AttackAdversarial Robustness | CodeCode Available | 1 | 5 |
| CoDEx: A Comprehensive Knowledge Graph Completion Benchmark | Sep 16, 2020 | BenchmarkingKnowledge Graph Completion | CodeCode Available | 1 | 5 |
| Collab-Overcooked: Benchmarking and Evaluating Large Language Models as Collaborative Agents | Feb 27, 2025 | Benchmarking | CodeCode Available | 1 | 5 |
| An Empirical Study on Google Research Football Multi-agent Scenarios | May 16, 2023 | BenchmarkingMulti-agent Reinforcement Learning | CodeCode Available | 1 | 5 |
| Addressing the generalization of 3D registration methods with a featureless baseline and an unbiased benchmark | Mar 23, 2024 | BenchmarkingImage to Point Cloud Registration | CodeCode Available | 1 | 5 |
| CodeUpdateArena: Benchmarking Knowledge Editing on API Updates | Jul 8, 2024 | Benchmarkingknowledge editing | CodeCode Available | 1 | 5 |
| New Protocols and Negative Results for Textual Entailment Data Collection | Apr 24, 2020 | BenchmarkingDiversity | CodeCode Available | 1 | 5 |
| Addressing Shortcomings in Fair Graph Learning Datasets: Towards a New Benchmark | Mar 9, 2024 | BenchmarkingFairness | CodeCode Available | 1 | 5 |
| An Empirical Study of GPT-4o Image Generation Capabilities | Apr 8, 2025 | BenchmarkingImage Generation | CodeCode Available | 1 | 5 |