| Benchmarking Model-Based Reinforcement Learning | Jul 3, 2019 | Benchmarkingmodel | CodeCode Available | 0 |
| Benchmarking Misuse Mitigation Against Covert Adversaries | Jun 6, 2025 | BenchmarkingLanguage Modeling | CodeCode Available | 0 |
| To Find Waldo You Need Contextual Cues: Debiasing Who's Waldo | Mar 30, 2022 | BenchmarkingPerson-centric Visual Grounding | CodeCode Available | 0 |
| Noisy Ostracods: A Fine-Grained, Imbalanced Real-World Dataset for Benchmarking Robust Machine Learning and Label Correction Methods | Dec 3, 2024 | Benchmarking | CodeCode Available | 0 |
| No Metric to Rule Them All: Toward Principled Evaluations of Graph-Learning Datasets | Feb 4, 2025 | AllBenchmarking | CodeCode Available | 0 |
| To Find Waldo You Need Contextual Cues: Debiasing Who’s Waldo | May 1, 2022 | BenchmarkingPerson-centric Visual Grounding | CodeCode Available | 0 |
| AstroVision: Towards Autonomous Feature Detection and Description for Missions to Small Bodies Using Deep Learning | Aug 3, 2022 | Benchmarking | CodeCode Available | 0 |
| AKFruitYield: Modular benchmarking and video analysis software for Azure Kinect cameras for fruit size and fruit yield estimation in apple orchards | Oct 6, 2023 | Benchmarking | CodeCode Available | 0 |
| ShuffleMix: Improving Representations via Channel-Wise Shuffle of Interpolated Hidden States | May 30, 2023 | BenchmarkingData Augmentation | CodeCode Available | 0 |
| NorEval: A Norwegian Language Understanding and Generation Evaluation Benchmark | Apr 10, 2025 | Benchmarking | CodeCode Available | 0 |