| Global Prediction of COVID-19 Variant Emergence Using Dynamics-Informed Graph Neural Networks | Jan 7, 2024 | BenchmarkingGraph Neural Network | CodeCode Available | 0 |
| GiantHunter: Accurate detection of giant virus in metagenomic data using reinforcement-learning and Monte Carlo tree search | Jan 26, 2025 | BenchmarkingDiversity | CodeCode Available | 0 |
| Benchmarking Temporal Reasoning and Alignment Across Chinese Dynasties | Feb 24, 2025 | Benchmarking | CodeCode Available | 0 |
| Safe Trajectory Generation for Complex Urban Environments Using Spatio-temporal Semantic Corridor | Jun 24, 2019 | Autonomous VehiclesBenchmarking | CodeCode Available | 0 |
| Natural Image Noise Dataset | Jun 1, 2019 | BenchmarkingDenoising | CodeCode Available | 0 |
| Benchmarking Suite for Synthetic Aperture Radar Imagery Anomaly Detection (SARIAD) Algorithms | Apr 10, 2025 | Anomaly DetectionBenchmarking | CodeCode Available | 0 |
| SAGED: A Holistic Bias-Benchmarking Pipeline for Language Models with Customisable Fairness Calibration | Sep 17, 2024 | Benchmarkingcounterfactual | CodeCode Available | 0 |
| Geological Inference from Textual Data using Word Embeddings | Apr 10, 2025 | BenchmarkingWord Embeddings | CodeCode Available | 0 |
| Flexible Generation of Preference Data for Recommendation Analysis | Jul 23, 2024 | BenchmarkingRecommendation Systems | CodeCode Available | 0 |
| AutoMIR: Effective Zero-Shot Medical Information Retrieval without Relevance Labels | Oct 26, 2024 | BenchmarkingInformation Retrieval | CodeCode Available | 0 |
| MiLiC-Eval: Benchmarking Multilingual LLMs for China's Minority Languages | Mar 3, 2025 | Benchmarking | CodeCode Available | 0 |
| The LOCATA Challenge: Acoustic Source Localization and Tracking | Sep 3, 2019 | BenchmarkingSound Source Localization | CodeCode Available | 0 |
| Generative Models for Fast Simulation of Cherenkov Detectors at the Electron-Ion Collider | Apr 26, 2025 | BenchmarkingGPU | CodeCode Available | 0 |
| A Meta-Analysis of the Anomaly Detection Problem | Mar 3, 2015 | Anomaly DetectionBenchmarking | CodeCode Available | 0 |
| On the Measure of Intelligence | Nov 5, 2019 | ARCBenchmarking | CodeCode Available | 0 |
| Generalization and Regularization in DQN | Sep 29, 2018 | Atari GamesBenchmarking | CodeCode Available | 0 |
| Automatic Resolution of Domain Name Disputes | Nov 1, 2021 | Benchmarking | CodeCode Available | 0 |
| Mind the XAI Gap: A Human-Centered LLM Framework for Democratizing Explainable AI | Jun 13, 2025 | BenchmarkingIn-Context Learning | CodeCode Available | 0 |
| Automatic benchmarking of large multimodal models via iterative experiment programming | Jun 18, 2024 | BenchmarkingLanguage Modeling | CodeCode Available | 0 |
| GenderBench: Evaluation Suite for Gender Biases in LLMs | May 17, 2025 | Benchmarking | CodeCode Available | 0 |
| MineRL: A Large-Scale Dataset of Minecraft Demonstrations | Jul 29, 2019 | BenchmarkingDeep Reinforcement Learning | CodeCode Available | 0 |
| GenCeption: Evaluate Multimodal LLMs with Unlabeled Unimodal Data | Feb 22, 2024 | Benchmarking | CodeCode Available | 0 |
| GECOBench: A Gender-Controlled Text Dataset and Benchmark for Quantifying Biases in Explanations | Jun 17, 2024 | BenchmarkingDataset Generation | CodeCode Available | 0 |
| Mining-Gym: A Configurable RL Benchmarking Environment for Truck Dispatch Scheduling | Mar 24, 2025 | BenchmarkingOpenAI Gym | CodeCode Available | 0 |
| Fully Automatic Segmentation of Gross Target Volume and Organs-at-Risk for Radiotherapy Planning of Nasopharyngeal Carcinoma | Oct 4, 2023 | BenchmarkingSegmentation | CodeCode Available | 0 |
| MIP-GAF: A MLLM-annotated Benchmark for Most Important Person Localization and Group Context Understanding | Sep 10, 2024 | BenchmarkingLanguage Modeling | CodeCode Available | 0 |
| Mirage: Model-Agnostic Graph Distillation for Graph Classification | Oct 14, 2023 | BenchmarkingClassification | CodeCode Available | 0 |
| Benchmarking Subset Selection from Large Candidate Solution Sets in Evolutionary Multi-objective Optimization | Jan 18, 2022 | Benchmarking | CodeCode Available | 0 |
| Sanity Simulations for Saliency Methods | May 13, 2021 | Benchmarking | CodeCode Available | 0 |
| From Variability to Stability: Advancing RecSys Benchmarking Practices | Feb 15, 2024 | BenchmarkingCollaborative Filtering | CodeCode Available | 0 |
| ALTIS: Modernizing GPGPU Benchmarking | Jun 25, 2019 | BenchmarkingGPU | CodeCode Available | 0 |
| From raw affiliations to organization identifiers | May 12, 2025 | BenchmarkingMetadata quality | CodeCode Available | 0 |
| Automated Text-to-Table for Reasoning-Intensive Table QA: Pipeline Design and Benchmarking Insights | May 26, 2025 | BenchmarkingQuestion Answering | CodeCode Available | 0 |
| 3D Face Reconstruction Error Decomposed: A Modular Benchmark for Fair and Fast Method Evaluation | May 23, 2025 | 3D Face ReconstructionBenchmarking | CodeCode Available | 0 |
| MixMAS: A Framework for Sampling-Based Mixer Architecture Search for Multimodal Fusion and Learning | Dec 24, 2024 | Benchmarking | CodeCode Available | 0 |
| From Past to Present: A Survey of Malicious URL Detection Techniques, Datasets and Code Repositories | Apr 23, 2025 | Benchmarking | CodeCode Available | 0 |
| The Multiple Subnetwork Hypothesis: Enabling Multidomain Learning by Isolating Task-Specific Subnetworks in Feedforward Neural Networks | Jul 18, 2022 | Benchmarking | CodeCode Available | 0 |
| MLaKE: Multilingual Knowledge Editing Benchmark for Large Language Models | Apr 7, 2024 | Benchmarkingknowledge editing | CodeCode Available | 0 |
| SATBench: Benchmarking the speed-accuracy tradeoff in object recognition by humans and dynamic neural networks | Jun 16, 2022 | BenchmarkingDynamic neural networks | CodeCode Available | 0 |
| MLDebugging: Towards Benchmarking Code Debugging Across Multi-Library Scenarios | Jun 15, 2025 | Benchmarking | CodeCode Available | 0 |
| From MNIST to ImageNet and Back: Benchmarking Continual Curriculum Learning | Mar 16, 2023 | BenchmarkingContinual Learning | CodeCode Available | 0 |
| SAWEC: Sensing-Assisted Wireless Edge Computing | Feb 15, 2024 | BenchmarkingEdge-computing | CodeCode Available | 0 |
| From Knowledge to Reasoning: Evaluating LLMs for Ionic Liquids Research in Chemical and Biological Engineering | May 11, 2025 | BenchmarkingGeneral Knowledge | CodeCode Available | 0 |
| Vote'n'Rank: Revision of Benchmarking with Social Choice Theory | Oct 11, 2022 | BenchmarkingResult aggregation | CodeCode Available | 0 |
| AlphaZip: Neural Network-Enhanced Lossless Text Compression | Sep 23, 2024 | BenchmarkingData Compression | CodeCode Available | 0 |
| ML-Net: multi-label classification of biomedical texts with deep neural networks | Nov 13, 2018 | BenchmarkingClassification | CodeCode Available | 0 |
| From Modern CNNs to Vision Transformers: Assessing the Performance, Robustness, and Classification Strategies of Deep Learning Models in Histopathology | Apr 11, 2022 | BenchmarkingCancer Classification | CodeCode Available | 0 |
| mlOSP: Towards a Unified Implementation of Regression Monte Carlo Algorithms | Dec 1, 2020 | BenchmarkingBIG-bench Machine Learning | CodeCode Available | 0 |
| From Bytes to Borsch: Fine-Tuning Gemma and Mistral for the Ukrainian Language Representation | Apr 14, 2024 | BenchmarkingDiversity | CodeCode Available | 0 |
| MLPerf Inference Benchmark | Nov 6, 2019 | Benchmarking | CodeCode Available | 0 |