| FALCON: Feature-Label Constrained Graph Net Collapse for Memory Efficient GNNs | Dec 27, 2023 | BenchmarkingGPU | CodeCode Available | 0 | 5 |
| GenCeption: Evaluate Multimodal LLMs with Unlabeled Unimodal Data | Feb 22, 2024 | Benchmarking | CodeCode Available | 0 | 5 |
| Benchmarking Keyword Spotting Efficiency on Neuromorphic Hardware | Dec 4, 2018 | BenchmarkingCPU | CodeCode Available | 0 | 5 |
| GenderBench: Evaluation Suite for Gender Biases in LLMs | May 17, 2025 | Benchmarking | CodeCode Available | 0 | 5 |
| Did the Models Understand Documents? Benchmarking Models for Language Understanding in Document-Level Relation Extraction | Jun 20, 2023 | BenchmarkingDocument-level Relation Extraction | CodeCode Available | 0 | 5 |
| GECOBench: A Gender-Controlled Text Dataset and Benchmark for Quantifying Biases in Explanations | Jun 17, 2024 | BenchmarkingDataset Generation | CodeCode Available | 0 | 5 |
| Dialogue Quality and Emotion Annotations for Customer Support Conversations | Nov 23, 2023 | BenchmarkingDiversity | CodeCode Available | 0 | 5 |
| Benchmarking Intersectional Biases in NLP | Jul 1, 2022 | BenchmarkingBIG-bench Machine Learning | CodeCode Available | 0 | 5 |
| DFEE: Interactive DataFlow Execution and Evaluation Kit | Dec 4, 2022 | BenchmarkingScheduling | CodeCode Available | 0 | 5 |
| A Manually Annotated Image-Caption Dataset for Detecting Children in the Wild | Jun 11, 2025 | Age EstimationBenchmarking | CodeCode Available | 0 | 5 |
| Grounding Synthetic Data Evaluations of Language Models in Unsupervised Document Corpora | May 13, 2025 | BenchmarkingDiagnostic | CodeCode Available | 0 | 5 |
| Benchmarking Commercial Intent Detection Services with Practice-Driven Evaluations | Dec 7, 2020 | BenchmarkingGoal-Oriented Dialog | CodeCode Available | 0 | 5 |
| From raw affiliations to organization identifiers | May 12, 2025 | BenchmarkingMetadata quality | CodeCode Available | 0 | 5 |
| From Past to Present: A Survey of Malicious URL Detection Techniques, Datasets and Code Repositories | Apr 23, 2025 | Benchmarking | CodeCode Available | 0 | 5 |
| From Variability to Stability: Advancing RecSys Benchmarking Practices | Feb 15, 2024 | BenchmarkingCollaborative Filtering | CodeCode Available | 0 | 5 |
| From Modern CNNs to Vision Transformers: Assessing the Performance, Robustness, and Classification Strategies of Deep Learning Models in Histopathology | Apr 11, 2022 | BenchmarkingCancer Classification | CodeCode Available | 0 | 5 |
| From Bytes to Borsch: Fine-Tuning Gemma and Mistral for the Ukrainian Language Representation | Apr 14, 2024 | BenchmarkingDiversity | CodeCode Available | 0 | 5 |
| From Knowledge to Reasoning: Evaluating LLMs for Ionic Liquids Research in Chemical and Biological Engineering | May 11, 2025 | BenchmarkingGeneral Knowledge | CodeCode Available | 0 | 5 |
| FR-MRInet: A Deep Convolutional Encoder-Decoder for Brain Tumor Segmentation with Relu-RGB and Sliding-window | Jul 26, 2018 | BenchmarkingBrain Tumor Segmentation | CodeCode Available | 0 | 5 |
| From MNIST to ImageNet and Back: Benchmarking Continual Curriculum Learning | Mar 16, 2023 | BenchmarkingContinual Learning | CodeCode Available | 0 | 5 |
| Arabic Speech Recognition by End-to-End, Modular Systems and Human | Jan 21, 2021 | Arabic Speech RecognitionAutomatic Speech Recognition | CodeCode Available | 0 | 5 |
| Detecting Stereotypes and Anti-stereotypes the Correct Way Using Social Psychological Underpinnings | Apr 4, 2025 | Benchmarking | CodeCode Available | 0 | 5 |
| Recognizing Object Affordances to Support Scene Reasoning for Manipulation Tasks | Sep 12, 2019 | Affordance DetectionAffordance Recognition | CodeCode Available | 0 | 5 |
| Detecting critical treatment effect bias in small subgroups | Apr 29, 2024 | BenchmarkingDecision Making | CodeCode Available | 0 | 5 |
| FRAMES-VQA: Benchmarking Fine-Tuning Robustness across Multi-Modal Shifts in Visual Question Answering | May 27, 2025 | BenchmarkingQuestion Answering | CodeCode Available | 0 | 5 |