| Benchmarking Foundation Models on Exceptional Cases: Dataset Creation and Validation | Oct 23, 2024 | ArticlesBenchmarking | CodeCode Available | 0 |
| CSS: A Large-scale Cross-schema Chinese Text-to-SQL Medical Dataset | May 25, 2023 | BenchmarkingText to SQL | CodeCode Available | 0 |
| Cryo-RALib -- a modular library for accelerating alignment in cryo-EM | Nov 11, 2020 | BenchmarkingGPU | CodeCode Available | 0 |
| What the Weight?! A Unified Framework for Zero-Shot Knowledge Composition | Jan 23, 2024 | Benchmarking | CodeCode Available | 0 |
| STOP! Benchmarking Large Language Models with Sensitivity Testing on Offensive Progressions | Sep 20, 2024 | BenchmarkingSensitivity | CodeCode Available | 0 |
| Cross-Lingual Text Classification of Transliterated Hindi and Malayalam | Aug 31, 2021 | BenchmarkingClassification | CodeCode Available | 0 |
| Benchmarking Flexible Electric Loads Scheduling Algorithms under Market Price Uncertainty | Feb 4, 2020 | BenchmarkingDecision Making | CodeCode Available | 0 |
| Yum-me: A Personalized Nutrient-based Meal Recommender System | May 25, 2016 | BenchmarkingRecommendation Systems | CodeCode Available | 0 |
| Benchmarking Federated Learning for Semantic Datasets: Federated Scene Graph Generation | Dec 11, 2024 | BenchmarkingFederated Learning | CodeCode Available | 0 |
| Cross-lingual sentiment classification in low-resource Bengali language | Nov 1, 2020 | BenchmarkingClassification | CodeCode Available | 0 |
| Benchmarking Feature Upsampling Methods for Vision Foundation Models using Interactive Segmentation | May 4, 2025 | BenchmarkingFeature Upsampling | CodeCode Available | 0 |
| STREETS: A Novel Camera Network Dataset for Traffic Flow | Dec 1, 2019 | Benchmarking | CodeCode Available | 0 |
| Benchmarking Feature-based Algorithm Selection Systems for Black-box Numerical Optimization | Sep 17, 2021 | Benchmarking | CodeCode Available | 0 |
| Cross-Lingual Auto Evaluation for Assessing Multilingual LLMs | Oct 17, 2024 | Benchmarking | CodeCode Available | 0 |
| Benchmarking Failures in Tool-Augmented Language Models | Mar 18, 2025 | BenchmarkingText Generation | CodeCode Available | 0 |
| CRNN: A Joint Neural Network for Redundancy Detection | Jun 4, 2017 | BenchmarkingGeneral Classification | CodeCode Available | 0 |
| Critical review of conformational B-cell epitope prediction methods | Jan 10, 2023 | BenchmarkingDrug Design | CodeCode Available | 0 |
| PICO Element Detection in Medical Text via Long Short-Term Memory Neural Networks | Jul 1, 2018 | BenchmarkingDecision Making | CodeCode Available | 0 |
| Stronger Than You Think: Benchmarking Weak Supervision on Realistic Tasks | Jan 13, 2025 | Benchmarking | CodeCode Available | 0 |
| CriSp: Leveraging Tread Depth Maps for Enhanced Crime-Scene Shoeprint Matching | Apr 25, 2024 | BenchmarkingData Augmentation | CodeCode Available | 0 |
| PINT: Physics-Informed Neural Time Series Models with Applications to Long-term Inference on WeatherBench 2m-Temperature Data | Feb 6, 2025 | BenchmarkingTime Series | CodeCode Available | 0 |
| An Optical Control Environment for Benchmarking Reinforcement Learning Algorithms | Mar 23, 2022 | BenchmarkingDeep Reinforcement Learning | CodeCode Available | 0 |
| STRUCTSENSE: A Task-Agnostic Agentic Framework for Structured Information Extraction with Human-In-The-Loop Evaluation and Benchmarking | Jul 4, 2025 | BenchmarkingNavigate | CodeCode Available | 0 |
| An open unified deep graph learning framework for discovering drug leads | Dec 6, 2022 | BenchmarkingDrug Discovery | CodeCode Available | 0 |
| PixelBrax: Learning Continuous Control from Pixels End-to-End on the GPU | Jan 16, 2025 | Benchmarkingcontinuous-control | CodeCode Available | 0 |
| PixelHop: A Successive Subspace Learning (SSL) Method for Object Classification | Sep 17, 2019 | BenchmarkingDecision Making | CodeCode Available | 0 |
| pke: an open source python-based keyphrase extraction toolkit | Dec 1, 2016 | BenchmarkingKeyphrase Extraction | CodeCode Available | 0 |
| Benchmarking Educational Program Repair | May 8, 2024 | BenchmarkingProgram Repair | CodeCode Available | 0 |
| A Benchmarking Study of Vision-based Robotic Grasping Algorithms | Mar 14, 2025 | BenchmarkingRobotic Grasping | CodeCode Available | 0 |
| CrisisLTLSum: A Benchmark for Local Crisis Event Timeline Extraction and Summarization | Oct 25, 2022 | Abstractive Text SummarizationBenchmarking | CodeCode Available | 0 |
| CREPO: An Open Repository to Benchmark Credal Network Algorithms | May 10, 2021 | Benchmarking | CodeCode Available | 0 |
| A Framework for Evaluating PM2.5 Forecasts from the Perspective of Individual Decision Making | Sep 9, 2024 | BenchmarkingDecision Making | CodeCode Available | 0 |
| Creating and Leveraging a Synthetic Dataset of Cloud Optical Thickness Measures for Cloud Detection in MSI | Nov 23, 2023 | BenchmarkingCloud Detection | CodeCode Available | 0 |
| CoSpace: Benchmarking Continuous Space Perception Ability for Vision-Language Models | Mar 18, 2025 | BenchmarkingSpatial Reasoning | CodeCode Available | 0 |
| ConvGeN: Convex space learning improves deep-generative oversampling for tabular imbalanced classification on smaller datasets | Jun 20, 2022 | BenchmarkingFraud Detection | CodeCode Available | 0 |
| PMLB: A Large Benchmark Suite for Machine Learning Evaluation and Comparison | Mar 1, 2017 | BenchmarkingBIG-bench Machine Learning | CodeCode Available | 0 |
| Controlling Risk of Retrieval-augmented Generation: A Counterfactual Prompting Framework | Sep 24, 2024 | Benchmarkingcounterfactual | CodeCode Available | 0 |
| pmuBAGE: The Benchmarking Assortment of Generated PMU Data for Power System Events -- Part I: Overview and Results | Apr 3, 2022 | Benchmarking | CodeCode Available | 0 |
| pmuBAGE: The Benchmarking Assortment of Generated PMU Data for Power System Events | Oct 25, 2022 | Benchmarking | CodeCode Available | 0 |
| Continuous Optimization Benchmarks by Simulation | Aug 14, 2020 | BenchmarkingGaussian Processes | CodeCode Available | 0 |
| Continual Learning Strategies for 3D Engineering Regression Problems: A Benchmarking Study | Apr 16, 2025 | BenchmarkingContinual Learning | CodeCode Available | 0 |
| Benchmarking Dynamic SLO Compliance in Distributed Computing Continuum Systems | Mar 5, 2025 | BenchmarkingCPU | CodeCode Available | 0 |
| Structured Prediction Problem Archive | Feb 4, 2022 | BenchmarkingPrediction | CodeCode Available | 0 |
| Style Outweighs Substance: Failure Modes of LLM Judges in Alignment Benchmarking | Sep 23, 2024 | BenchmarkingDiversity | CodeCode Available | 0 |
| Benchmarking down-scaled (not so large) pre-trained language models | Sep 1, 2021 | Benchmarking | CodeCode Available | 0 |
| PoLLMgraph: Unraveling Hallucinations in Large Language Models via State Transition Dynamics | Apr 6, 2024 | BenchmarkingHallucination | CodeCode Available | 0 |
| ContextGNN goes to Elliot: Towards Benchmarking Relational Deep Learning for Static Link Prediction (aka Personalized Item Recommendation) | Mar 20, 2025 | BenchmarkingLink Prediction | CodeCode Available | 0 |
| Selected Languages are All You Need for Cross-lingual Truthfulness Transfer | Jun 20, 2024 | AllBenchmarking | CodeCode Available | 0 |
| Content-Aware Differential Privacy with Conditional Invertible Neural Networks | Jul 29, 2022 | Benchmarking | CodeCode Available | 0 |
| Population-wise Labeling of Sulcal Graphs using Multi-graph Matching | Jan 31, 2023 | BenchmarkingGraph Matching | CodeCode Available | 0 |