| Benchmarking Graph Neural Networks on Dynamic Link Prediction | Sep 29, 2021 | BenchmarkingDynamic Link Prediction | CodeCode Available | 1 |
| Benchmarking Graph Neural Networks for FMRI analysis | Nov 16, 2022 | Benchmarking | CodeCode Available | 1 |
| Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs | Jun 22, 2023 | Arithmetic ReasoningBenchmarking | CodeCode Available | 1 |
| BiCo-Net: Regress Globally, Match Locally for Robust 6D Pose Estimation | May 7, 2022 | 6D Pose EstimationBenchmarking | CodeCode Available | 1 |
| ClearPose: Large-scale Transparent Object Dataset and Benchmark | Mar 8, 2022 | BenchmarkingDepth Completion | CodeCode Available | 1 |
| BRIDGE: Benchmarking Large Language Models for Understanding Real-world Clinical Practice Text | Apr 28, 2025 | Benchmarking | CodeCode Available | 1 |
| Performance Evaluation of Deep Transfer Learning on Multiclass Identification of Common Weed Species in Cotton Production Systems | Oct 11, 2021 | BenchmarkingManagement | CodeCode Available | 1 |
| PGDQN: Preference-Guided Deep Q-Network | Oct 3, 2023 | Atari GamesBenchmarking | CodeCode Available | 1 |
| Cross-Modal Bidirectional Interaction Model for Referring Remote Sensing Image Segmentation | Oct 11, 2024 | BenchmarkingImage Segmentation | CodeCode Available | 1 |
| Beyond neural scaling laws: beating power law scaling via data pruning | Jun 29, 2022 | Benchmarking | CodeCode Available | 1 |
| Beyond Normal: On the Evaluation of Mutual Information Estimators | Jun 19, 2023 | BenchmarkingDomain Generalization | CodeCode Available | 1 |
| CySecBench: Generative AI-based CyberSecurity-focused Prompt Dataset for Benchmarking Large Language Models | Jan 2, 2025 | BenchmarkingComputer Security | CodeCode Available | 1 |
| dEchorate: a Calibrated Room Impulse Response Database for Echo-aware Signal Processing | Apr 27, 2021 | BenchmarkingRetrieval | CodeCode Available | 1 |
| PLANTAIN: Diffusion-inspired Pose Score Minimization for Fast and Accurate Molecular Docking | Jul 22, 2023 | BenchmarkingMolecular Docking | CodeCode Available | 1 |
| Developing a Scalable Benchmark for Assessing Large Language Models in Knowledge Graph Engineering | Aug 31, 2023 | BenchmarkingDataset Generation | CodeCode Available | 1 |
| ECRECer: Enzyme Commission Number Recommendation and Benchmarking based on Multiagent Dual-core Learning | Feb 8, 2022 | BenchmarkingLanguage Modelling | CodeCode Available | 1 |
| Kvasir-Instrument: Diagnostic and therapeutic tool segmentation dataset in gastrointestinal endoscopy | Oct 23, 2020 | BenchmarkingDiagnostic | CodeCode Available | 1 |
| RADIATE: A Radar Dataset for Automotive Perception in Bad Weather | Oct 18, 2020 | Autonomous DrivingBenchmarking | CodeCode Available | 1 |
| POGEMA: A Benchmark Platform for Cooperative Multi-Agent Pathfinding | Jul 20, 2024 | BenchmarkingHeuristic Search | CodeCode Available | 1 |
| CLoG: Benchmarking Continual Learning of Image Generation Models | Jun 7, 2024 | BenchmarkingContinual Learning | CodeCode Available | 1 |
| Positional Encoding in Transformer-Based Time Series Models: A Survey | Feb 17, 2025 | Anomaly DetectionBenchmarking | CodeCode Available | 1 |
| PowerMamba: A Deep State Space Model and Comprehensive Benchmark for Time Series Prediction in Electric Power Systems | Dec 9, 2024 | BenchmarkingPrediction | CodeCode Available | 1 |
| Benchmarking Graph Learning for Drug-Drug Interaction Prediction | Oct 24, 2024 | BenchmarkingGraph Learning | —Unverified | 0 |
| A practical generalization metric for deep networks benchmarking | Sep 2, 2024 | BenchmarkingDiversity | —Unverified | 0 |
| AERF: Adaptive ensemble random fuzzy algorithm for anomaly detection in cloud computing | Jan 9, 2023 | Anomaly DetectionBenchmarking | —Unverified | 0 |