| Global Prediction of COVID-19 Variant Emergence Using Dynamics-Informed Graph Neural Networks | Jan 7, 2024 | BenchmarkingGraph Neural Network | CodeCode Available | 0 |
| GiantHunter: Accurate detection of giant virus in metagenomic data using reinforcement-learning and Monte Carlo tree search | Jan 26, 2025 | BenchmarkingDiversity | CodeCode Available | 0 |
| Benchmarking Temporal Reasoning and Alignment Across Chinese Dynasties | Feb 24, 2025 | Benchmarking | CodeCode Available | 0 |
| Safe Trajectory Generation for Complex Urban Environments Using Spatio-temporal Semantic Corridor | Jun 24, 2019 | Autonomous VehiclesBenchmarking | CodeCode Available | 0 |
| Natural Image Noise Dataset | Jun 1, 2019 | BenchmarkingDenoising | CodeCode Available | 0 |
| Benchmarking Suite for Synthetic Aperture Radar Imagery Anomaly Detection (SARIAD) Algorithms | Apr 10, 2025 | Anomaly DetectionBenchmarking | CodeCode Available | 0 |
| SAGED: A Holistic Bias-Benchmarking Pipeline for Language Models with Customisable Fairness Calibration | Sep 17, 2024 | Benchmarkingcounterfactual | CodeCode Available | 0 |
| Geological Inference from Textual Data using Word Embeddings | Apr 10, 2025 | BenchmarkingWord Embeddings | CodeCode Available | 0 |
| Flexible Generation of Preference Data for Recommendation Analysis | Jul 23, 2024 | BenchmarkingRecommendation Systems | CodeCode Available | 0 |
| AutoMIR: Effective Zero-Shot Medical Information Retrieval without Relevance Labels | Oct 26, 2024 | BenchmarkingInformation Retrieval | CodeCode Available | 0 |
| MiLiC-Eval: Benchmarking Multilingual LLMs for China's Minority Languages | Mar 3, 2025 | Benchmarking | CodeCode Available | 0 |
| The LOCATA Challenge: Acoustic Source Localization and Tracking | Sep 3, 2019 | BenchmarkingSound Source Localization | CodeCode Available | 0 |
| Generative Models for Fast Simulation of Cherenkov Detectors at the Electron-Ion Collider | Apr 26, 2025 | BenchmarkingGPU | CodeCode Available | 0 |
| A Meta-Analysis of the Anomaly Detection Problem | Mar 3, 2015 | Anomaly DetectionBenchmarking | CodeCode Available | 0 |
| On the Measure of Intelligence | Nov 5, 2019 | ARCBenchmarking | CodeCode Available | 0 |
| Generalization and Regularization in DQN | Sep 29, 2018 | Atari GamesBenchmarking | CodeCode Available | 0 |
| Automatic Resolution of Domain Name Disputes | Nov 1, 2021 | Benchmarking | CodeCode Available | 0 |
| Mind the XAI Gap: A Human-Centered LLM Framework for Democratizing Explainable AI | Jun 13, 2025 | BenchmarkingIn-Context Learning | CodeCode Available | 0 |
| Automatic benchmarking of large multimodal models via iterative experiment programming | Jun 18, 2024 | BenchmarkingLanguage Modeling | CodeCode Available | 0 |
| GenderBench: Evaluation Suite for Gender Biases in LLMs | May 17, 2025 | Benchmarking | CodeCode Available | 0 |
| MineRL: A Large-Scale Dataset of Minecraft Demonstrations | Jul 29, 2019 | BenchmarkingDeep Reinforcement Learning | CodeCode Available | 0 |
| GenCeption: Evaluate Multimodal LLMs with Unlabeled Unimodal Data | Feb 22, 2024 | Benchmarking | CodeCode Available | 0 |
| GECOBench: A Gender-Controlled Text Dataset and Benchmark for Quantifying Biases in Explanations | Jun 17, 2024 | BenchmarkingDataset Generation | CodeCode Available | 0 |
| Mining-Gym: A Configurable RL Benchmarking Environment for Truck Dispatch Scheduling | Mar 24, 2025 | BenchmarkingOpenAI Gym | CodeCode Available | 0 |
| Fully Automatic Segmentation of Gross Target Volume and Organs-at-Risk for Radiotherapy Planning of Nasopharyngeal Carcinoma | Oct 4, 2023 | BenchmarkingSegmentation | CodeCode Available | 0 |