| JavaBench: A Benchmark of Object-Oriented Code Generation for Evaluating Large Language Models | Jun 10, 2024 | BenchmarkingCode Generation | CodeCode Available | 0 | 5 |
| Lost in Benchmarks? Rethinking Large Language Model Benchmarking with Item Response Theory | May 21, 2025 | BenchmarkingLanguage Modeling | CodeCode Available | 0 | 5 |
| Analyzing the Feature Extractor Networks for Face Image Synthesis | Jun 4, 2024 | BenchmarkingImage Generation | CodeCode Available | 0 | 5 |
| IoT Data Trust Evaluation via Machine Learning | Aug 15, 2023 | BenchmarkingTime Series | CodeCode Available | 0 | 5 |
| Introducing SLAMBench, a performance and accuracy benchmarking methodology for SLAM | Oct 8, 2014 | Benchmarking | CodeCode Available | 0 | 5 |
| Benchmarking Multi-dimensional AIGC Video Quality Assessment: A Dataset and Unified Model | Jul 31, 2024 | BenchmarkingLarge Language Model | CodeCode Available | 0 | 5 |
| Inverse Contextual Bandits: Learning How Behavior Evolves over Time | Jul 13, 2021 | BenchmarkingDecision Making | CodeCode Available | 0 | 5 |
| Calibrating Pre-trained Language Classifiers on LLM-generated Noisy Labels via Iterative Refinement | May 26, 2025 | Benchmarking | CodeCode Available | 0 | 5 |
| STEP: A Unified Spiking Transformer Evaluation Platform for Fair and Reproducible Benchmarking | May 16, 2025 | Benchmarking | CodeCode Available | 0 | 5 |
| Calibrated Adaptive Probabilistic ODE Solvers | Dec 15, 2020 | BenchmarkingDescriptive | CodeCode Available | 0 | 5 |
| INTERSPEECH 2009 Emotion Challenge Revisited: Benchmarking 15 Years of Progress in Speech Emotion Recognition | Jun 10, 2024 | BenchmarkingEmotion Recognition | CodeCode Available | 0 | 5 |
| Investigating the Impact of Hard Samples on Accuracy Reveals In-class Data Imbalance | Sep 22, 2024 | AutoMLBenchmarking | CodeCode Available | 0 | 5 |
| IPC: A Benchmark Data Set for Learning with Graph-Structured Data | May 15, 2019 | BenchmarkingGraph Classification | CodeCode Available | 0 | 5 |
| Machine Learning Cryptanalysis of a Quantum Random Number Generator | May 7, 2019 | BenchmarkingBIG-bench Machine Learning | CodeCode Available | 0 | 5 |
| Integrating Expert Knowledge into Logical Programs via LLMs | Feb 17, 2025 | BenchmarkingLogical Reasoning | CodeCode Available | 0 | 5 |
| Benchmarking Adversarial Robustness to Bias Elicitation in Large Language Models: Scalable Automated Assessment with LLM-as-a-Judge | Apr 10, 2025 | Adversarial RobustnessBenchmarking | CodeCode Available | 0 | 5 |
| Cable Tree Wiring -- Benchmarking Solvers on a Real-World Scheduling Problem with a Variety of Precedence Constraints | Nov 25, 2020 | BenchmarkingScheduling | CodeCode Available | 0 | 5 |
| Integration of nested cross-validation, automated hyperparameter optimization, high-performance computing to reduce and quantify the variance of test performance estimation of deep learning models | Mar 11, 2025 | BenchmarkingHyperparameter Optimization | CodeCode Available | 0 | 5 |
| InstaIndoor and Multi-modal Deep Learning for Indoor Scene Recognition | Dec 23, 2021 | BenchmarkingDeep Learning | CodeCode Available | 0 | 5 |
| B-XAIC Dataset: Benchmarking Explainable AI for Graph Neural Networks Using Chemical Data | May 28, 2025 | BenchmarkingDrug Discovery | CodeCode Available | 0 | 5 |
| inMOTIFin: a lightweight end-to-end simulation software for regulatory sequences | Jun 25, 2025 | Benchmarking | CodeCode Available | 0 | 5 |
| Analysis | OPEN | Published: 17 June 2019 Multitask learning and benchmarking with clinical time series data | Jun 17, 2019 | BenchmarkingBIG-bench Machine Learning | CodeCode Available | 0 | 5 |
| Building Conformal Prediction Intervals with Approximate Message Passing | Oct 21, 2024 | BenchmarkingConformal Prediction | CodeCode Available | 0 | 5 |
| CodeS: Towards Code Model Generalization Under Distribution Shift | Jun 11, 2022 | BenchmarkingCode Classification | CodeCode Available | 0 | 5 |
| Building and benchmarking an Arabic Speech Commands dataset for small-footprint keyword spotting | May 7, 2021 | BenchmarkingDeep Learning | CodeCode Available | 0 | 5 |