| Hierarchical graph neural nets can capture long-range interactions | Jul 15, 2021 | BenchmarkingMolecular Property Prediction | CodeCode Available | 1 | 5 |
| IDToolkit: A Toolkit for Benchmarking and Developing Inverse Design Algorithms in Nanophotonics | May 30, 2023 | Benchmarking | CodeCode Available | 1 | 5 |
| A Comparative Attention Framework for Better Few-Shot Object Detection on Aerial Images | Oct 25, 2022 | BenchmarkingFew-Shot Object Detection | CodeCode Available | 1 | 5 |
| Best practices for constructing, preparing, and evaluating protein-ligand binding affinity benchmarks | May 13, 2021 | BenchmarkingDrug Discovery | CodeCode Available | 1 | 5 |
| Are LLMs Capable of Data-based Statistical and Causal Reasoning? Benchmarking Advanced Quantitative Reasoning with Data | Feb 27, 2024 | Benchmarking | CodeCode Available | 1 | 5 |
| Image Colorization: A Survey and Dataset | Aug 25, 2020 | BenchmarkingColorization | CodeCode Available | 1 | 5 |
| German Text Embedding Clustering Benchmark | Jan 5, 2024 | BenchmarkingClustering | CodeCode Available | 1 | 5 |
| A Closer Look at Mortality Risk Prediction from Electrocardiograms | Jun 24, 2024 | BenchmarkingPrediction | CodeCode Available | 1 | 5 |
| Benchmarking MRI Reconstruction Neural Networks on Large Public Datasets | Mar 6, 2020 | BenchmarkingImage Reconstruction | CodeCode Available | 1 | 5 |
| A global analysis of metrics used for measuring performance in natural language processing | Apr 25, 2022 | BenchmarkingMachine Translation | CodeCode Available | 1 | 5 |
| A Scale-Invariant Sorting Criterion to Find a Causal Order in Additive Noise Models | Mar 31, 2023 | BenchmarkingCausal Discovery | CodeCode Available | 1 | 5 |
| Beyond Correctness: Benchmarking Multi-dimensional Code Generation for Large Language Models | Jul 16, 2024 | BenchmarkingCode Generation | CodeCode Available | 1 | 5 |
| Benchmarking Large Language Models for News Summarization | Jan 31, 2023 | BenchmarkingNews Summarization | CodeCode Available | 1 | 5 |
| Benchmarking Multidomain English-Indonesian Machine Translation | May 1, 2020 | BenchmarkingMachine Translation | CodeCode Available | 1 | 5 |
| AsEP: Benchmarking Deep Learning Methods for Antibody-specific Epitope Prediction | Jul 25, 2024 | BenchmarkingDeep Learning | CodeCode Available | 1 | 5 |
| Geoclidean: Few-Shot Generalization in Euclidean Geometry | Nov 30, 2022 | Benchmarking | CodeCode Available | 1 | 5 |
| German's Next Language Model | Oct 21, 2020 | BenchmarkingDocument Classification | CodeCode Available | 1 | 5 |
| GLGENN: A Novel Parameter-Light Equivariant Neural Networks Architecture Based on Clifford Geometric Algebras | Jun 11, 2025 | Benchmarking | CodeCode Available | 1 | 5 |
| FinDABench: Benchmarking Financial Data Analysis Ability of Large Language Models | Jan 1, 2024 | Benchmarking | CodeCode Available | 1 | 5 |
| Benchmarking Multimodal Knowledge Conflict for Large Multimodal Models | May 26, 2025 | BenchmarkingRAG | CodeCode Available | 1 | 5 |
| A Comparative Visual Analytics Framework for Evaluating Evolutionary Processes in Multi-objective Optimization | Aug 10, 2023 | BenchmarkingDecision Making | CodeCode Available | 1 | 5 |
| BiBench: Benchmarking and Analyzing Network Binarization | Jan 26, 2023 | BenchmarkingBinarization | CodeCode Available | 1 | 5 |
| Benchmarking Robustness of Text-Image Composed Retrieval | Nov 24, 2023 | AttributeBenchmarking | CodeCode Available | 1 | 5 |
| Ineq-Comp: Benchmarking Human-Intuitive Compositional Reasoning in Automated Theorem Proving on Inequalities | May 19, 2025 | Automated Theorem ProvingBenchmarking | CodeCode Available | 1 | 5 |
| Benchmarking Robustness to Adversarial Image Obfuscations | Jan 30, 2023 | Benchmarking | CodeCode Available | 1 | 5 |