| DispaRisk: Auditing Fairness Through Usable Information | May 20, 2024 | BenchmarkingBias Detection | CodeCode Available | 0 | 5 |
| A Framework for Evaluating PM2.5 Forecasts from the Perspective of Individual Decision Making | Sep 9, 2024 | BenchmarkingDecision Making | CodeCode Available | 0 | 5 |
| Flexible Generation of Preference Data for Recommendation Analysis | Jul 23, 2024 | BenchmarkingRecommendation Systems | CodeCode Available | 0 | 5 |
| Geological Inference from Textual Data using Word Embeddings | Apr 10, 2025 | BenchmarkingWord Embeddings | CodeCode Available | 0 | 5 |
| Generative Models for Fast Simulation of Cherenkov Detectors at the Electron-Ion Collider | Apr 26, 2025 | BenchmarkingGPU | CodeCode Available | 0 | 5 |
| Generalization and Regularization in DQN | Sep 29, 2018 | Atari GamesBenchmarking | CodeCode Available | 0 | 5 |
| Benchmarking Language-agnostic Intent Classification for Virtual Assistant Platforms | Jul 1, 2022 | BenchmarkingClassification | CodeCode Available | 0 | 5 |
| A Recipe for CAC: Mosaic-based Generalized Loss for Improved Class-Agnostic Counting | Apr 15, 2024 | Benchmarking | CodeCode Available | 0 | 5 |
| GenCeption: Evaluate Multimodal LLMs with Unlabeled Unimodal Data | Feb 22, 2024 | Benchmarking | CodeCode Available | 0 | 5 |
| Benchmarking Label Noise in Instance Segmentation: Spatial Noise Matters | Jun 16, 2024 | BenchmarkingInstance Segmentation | CodeCode Available | 0 | 5 |
| GenderBench: Evaluation Suite for Gender Biases in LLMs | May 17, 2025 | Benchmarking | CodeCode Available | 0 | 5 |
| GECOBench: A Gender-Controlled Text Dataset and Benchmark for Quantifying Biases in Explanations | Jun 17, 2024 | BenchmarkingDataset Generation | CodeCode Available | 0 | 5 |
| Benchmarking Keyword Spotting Efficiency on Neuromorphic Hardware | Dec 4, 2018 | BenchmarkingCPU | CodeCode Available | 0 | 5 |
| Did the Models Understand Documents? Benchmarking Models for Language Understanding in Document-Level Relation Extraction | Jun 20, 2023 | BenchmarkingDocument-level Relation Extraction | CodeCode Available | 0 | 5 |
| Dialogue Quality and Emotion Annotations for Customer Support Conversations | Nov 23, 2023 | BenchmarkingDiversity | CodeCode Available | 0 | 5 |
| Fully Automatic Segmentation of Gross Target Volume and Organs-at-Risk for Radiotherapy Planning of Nasopharyngeal Carcinoma | Oct 4, 2023 | BenchmarkingSegmentation | CodeCode Available | 0 | 5 |
| From Past to Present: A Survey of Malicious URL Detection Techniques, Datasets and Code Repositories | Apr 23, 2025 | Benchmarking | CodeCode Available | 0 | 5 |
| Benchmarking pre-trained text embedding models in aligning built asset information | Nov 18, 2024 | Asset ManagementBenchmarking | CodeCode Available | 0 | 5 |
| From Knowledge to Reasoning: Evaluating LLMs for Ionic Liquids Research in Chemical and Biological Engineering | May 11, 2025 | BenchmarkingGeneral Knowledge | CodeCode Available | 0 | 5 |
| From MNIST to ImageNet and Back: Benchmarking Continual Curriculum Learning | Mar 16, 2023 | BenchmarkingContinual Learning | CodeCode Available | 0 | 5 |
| From raw affiliations to organization identifiers | May 12, 2025 | BenchmarkingMetadata quality | CodeCode Available | 0 | 5 |
| Benchmarking Intersectional Biases in NLP | Jul 1, 2022 | BenchmarkingBIG-bench Machine Learning | CodeCode Available | 0 | 5 |
| DFEE: Interactive DataFlow Execution and Evaluation Kit | Dec 4, 2022 | BenchmarkingScheduling | CodeCode Available | 0 | 5 |
| A Manually Annotated Image-Caption Dataset for Detecting Children in the Wild | Jun 11, 2025 | Age EstimationBenchmarking | CodeCode Available | 0 | 5 |
| From Modern CNNs to Vision Transformers: Assessing the Performance, Robustness, and Classification Strategies of Deep Learning Models in Histopathology | Apr 11, 2022 | BenchmarkingCancer Classification | CodeCode Available | 0 | 5 |