| Fast hyperboloid decision tree algorithms | Oct 20, 2023 | BenchmarkingRiemannian optimization | CodeCode Available | 1 | 5 |
| BiCo-Net: Regress Globally, Match Locally for Robust 6D Pose Estimation | May 7, 2022 | 6D Pose EstimationBenchmarking | CodeCode Available | 1 | 5 |
| Beyond Correctness: Benchmarking Multi-dimensional Code Generation for Large Language Models | Jul 16, 2024 | BenchmarkingCode Generation | CodeCode Available | 1 | 5 |
| BiBench: Benchmarking and Analyzing Network Binarization | Jan 26, 2023 | BenchmarkingBinarization | CodeCode Available | 1 | 5 |
| FinDABench: Benchmarking Financial Data Analysis Ability of Large Language Models | Jan 1, 2024 | Benchmarking | CodeCode Available | 1 | 5 |
| ScreenQA: Large-Scale Question-Answer Pairs over Mobile App Screenshots | Sep 16, 2022 | BenchmarkingQuestion Answering | CodeCode Available | 1 | 5 |
| ScrewNet: Category-Independent Articulation Model Estimation From Depth Images Using Screw Theory | Aug 24, 2020 | Benchmarking | CodeCode Available | 1 | 5 |
| Benchmarking Graph Neural Networks on Dynamic Link Prediction | Sep 29, 2021 | BenchmarkingDynamic Link Prediction | CodeCode Available | 1 | 5 |
| Benchmarking Graph Neural Networks for FMRI analysis | Nov 16, 2022 | Benchmarking | CodeCode Available | 1 | 5 |
| Beyond neural scaling laws: beating power law scaling via data pruning | Jun 29, 2022 | Benchmarking | CodeCode Available | 1 | 5 |
| Beyond Normal: On the Evaluation of Mutual Information Estimators | Jun 19, 2023 | BenchmarkingDomain Generalization | CodeCode Available | 1 | 5 |
| Formalizing Multimedia Recommendation through Multimodal Deep Learning | Sep 11, 2023 | BenchmarkingDeep Learning | CodeCode Available | 1 | 5 |
| LagrangeBench: A Lagrangian Fluid Mechanics Benchmarking Suite | Sep 28, 2023 | Benchmarking | CodeCode Available | 1 | 5 |
| Large Language Models for Multi-Robot Systems: A Survey | Feb 6, 2025 | Action GenerationBenchmarking | CodeCode Available | 1 | 5 |
| LEAF: A Benchmark for Federated Settings | Dec 3, 2018 | Autonomous VehiclesBenchmarking | CodeCode Available | 1 | 5 |
| LIBMoE: A Library for comprehensive benchmarking Mixture of Experts in Large Language Models | Nov 1, 2024 | BenchmarkingMixture-of-Experts | CodeCode Available | 1 | 5 |
| Machine Learning for the Digital Typhoon Dataset: Extensions to Multiple Basins and New Developments in Representations and Tasks | Nov 25, 2024 | Benchmarkingobject-detection | CodeCode Available | 1 | 5 |
| MIRFLEX: Music Information Retrieval Feature Library for Extraction | Nov 1, 2024 | BenchmarkingInformation Retrieval | CodeCode Available | 1 | 5 |
| FELM: Benchmarking Factuality Evaluation of Large Language Models | Oct 1, 2023 | BenchmarkingMath | CodeCode Available | 1 | 5 |
| Smiles2Dock: an open large-scale multi-task dataset for ML-based molecular docking | Jun 9, 2024 | BenchmarkingDrug Discovery | CodeCode Available | 1 | 5 |
| FinanceReasoning: Benchmarking Financial Numerical Reasoning More Credible, Comprehensive and Challenging | Jun 6, 2025 | Benchmarking | CodeCode Available | 1 | 5 |
| FiFAR: A Fraud Detection Dataset for Learning to Defer | Dec 20, 2023 | BenchmarkingDecision Making | CodeCode Available | 1 | 5 |
| ACCESS DENIED INC: The First Benchmark Environment for Sensitivity Awareness | Jun 1, 2025 | BenchmarkingManagement | CodeCode Available | 0 | 5 |
| Conformal Prediction: A Theoretical Note and Benchmarking Transductive Node Classification in Graphs | Sep 26, 2024 | BenchmarkingConformal Prediction | CodeCode Available | 0 | 5 |
| Knowledge-Driven Slot Constraints for Goal-Oriented Dialogue Systems | Jun 1, 2021 | BenchmarkingGoal-Oriented Dialogue Systems | CodeCode Available | 0 | 5 |