| Benchmarking human visual search computational models in natural scenes: models comparison and reference datasets | Dec 10, 2021 | Benchmarking | CodeCode Available | 1 | 5 |
| CodeReef: an open platform for portable MLOps, reusable automation actions and reproducible benchmarking | Jan 22, 2020 | Benchmarkingobject-detection | CodeCode Available | 1 | 5 |
| COCO: The Large Scale Black-Box Optimization Benchmarking (bbob-largescale) Test Suite | Mar 15, 2019 | Benchmarking | CodeCode Available | 1 | 5 |
| Benchmarking Distribution Shift in Tabular Data with TableShift | Dec 10, 2023 | BenchmarkingBinary Classification | CodeCode Available | 1 | 5 |
| Towards Reliable Detection of LLM-Generated Texts: A Comprehensive Evaluation Framework with CUDRT | Jun 13, 2024 | BenchmarkingLLM-generated Text Detection | CodeCode Available | 1 | 5 |
| Codabench: Flexible, Easy-to-Use and Reproducible Benchmarking Platform | Oct 12, 2021 | Benchmarking | CodeCode Available | 1 | 5 |
| CySecBench: Generative AI-based CyberSecurity-focused Prompt Dataset for Benchmarking Large Language Models | Jan 2, 2025 | BenchmarkingComputer Security | CodeCode Available | 1 | 5 |
| Benchmarking Differential Privacy and Federated Learning for BERT Models | Jun 26, 2021 | BenchmarkingFederated Learning | CodeCode Available | 1 | 5 |
| Benchmarking Encoder-Decoder Architectures for Biplanar X-ray to 3D Shape Reconstruction | Sep 24, 2023 | 3D Shape ReconstructionAnatomy | CodeCode Available | 1 | 5 |
| AnuraSet: A dataset for benchmarking Neotropical anuran calls identification in passive acoustic monitoring | Jul 11, 2023 | Benchmarking | CodeCode Available | 1 | 5 |
| Benchmarking Knowledge Boundary for Large Language Models: A Different Perspective on Model Evaluation | Feb 18, 2024 | BenchmarkingLanguage Modeling | CodeCode Available | 1 | 5 |
| CO-Bench: Benchmarking Language Model Agents in Algorithm Search for Combinatorial Optimization | Apr 6, 2025 | BenchmarkingCombinatorial Optimization | CodeCode Available | 1 | 5 |
| Benchmarking Language Model Creativity: A Case Study on Code Generation | Jul 12, 2024 | BenchmarkingCode Generation | CodeCode Available | 1 | 5 |
| CODEBench: A Neural Architecture and Hardware Accelerator Co-Design Framework | Dec 7, 2022 | Benchmarking | CodeCode Available | 1 | 5 |
| CodeS: Natural Language to Code Repository via Multi-Layer Sketch | Mar 25, 2024 | Benchmarking | CodeCode Available | 1 | 5 |
| CLoG: Benchmarking Continual Learning of Image Generation Models | Jun 7, 2024 | BenchmarkingContinual Learning | CodeCode Available | 1 | 5 |
| Benchmarking Large Language Models for Persian: A Preliminary Study Focusing on ChatGPT | Apr 3, 2024 | BenchmarkingGeneral Knowledge | CodeCode Available | 1 | 5 |
| API-BLEND: A Comprehensive Corpora for Training and Benchmarking API LLMs | Feb 23, 2024 | Benchmarkingslot-filling | CodeCode Available | 1 | 5 |
| Adversarial Prompt Evaluation: Systematic Benchmarking of Guardrails Against Prompt Input Attacks on LLMs | Feb 21, 2025 | Benchmarking | CodeCode Available | 1 | 5 |
| Clinical Prompt Learning with Frozen Language Models | May 11, 2022 | BenchmarkingGPU | CodeCode Available | 1 | 5 |
| CloudEval-YAML: A Practical Benchmark for Cloud Configuration Generation | Nov 10, 2023 | BenchmarkingCloud Computing | CodeCode Available | 1 | 5 |
| A Platform for the Biomedical Application of Large Language Models | May 10, 2023 | BenchmarkingPrivacy Preserving | CodeCode Available | 1 | 5 |
| Decoding the Enigma: Benchmarking Humans and AIs on the Many Facets of Working Memory | Jul 20, 2023 | BenchmarkingDecision Making | CodeCode Available | 1 | 5 |
| Benchmarking Detection Transfer Learning with Vision Transformers | Nov 22, 2021 | Benchmarkingobject-detection | CodeCode Available | 1 | 5 |
| Benchmarking Deep Reinforcement Learning for Navigation in Denied Sensor Environments | Oct 18, 2024 | Autonomous NavigationBenchmarking | CodeCode Available | 1 | 5 |