| Benchmarking Foundation Models on Exceptional Cases: Dataset Creation and Validation | Oct 23, 2024 | ArticlesBenchmarking | CodeCode Available | 0 |
| CSS: A Large-scale Cross-schema Chinese Text-to-SQL Medical Dataset | May 25, 2023 | BenchmarkingText to SQL | CodeCode Available | 0 |
| Cryo-RALib -- a modular library for accelerating alignment in cryo-EM | Nov 11, 2020 | BenchmarkingGPU | CodeCode Available | 0 |
| What the Weight?! A Unified Framework for Zero-Shot Knowledge Composition | Jan 23, 2024 | Benchmarking | CodeCode Available | 0 |
| STOP! Benchmarking Large Language Models with Sensitivity Testing on Offensive Progressions | Sep 20, 2024 | BenchmarkingSensitivity | CodeCode Available | 0 |
| Cross-Lingual Text Classification of Transliterated Hindi and Malayalam | Aug 31, 2021 | BenchmarkingClassification | CodeCode Available | 0 |
| Benchmarking Flexible Electric Loads Scheduling Algorithms under Market Price Uncertainty | Feb 4, 2020 | BenchmarkingDecision Making | CodeCode Available | 0 |
| Yum-me: A Personalized Nutrient-based Meal Recommender System | May 25, 2016 | BenchmarkingRecommendation Systems | CodeCode Available | 0 |
| Benchmarking Federated Learning for Semantic Datasets: Federated Scene Graph Generation | Dec 11, 2024 | BenchmarkingFederated Learning | CodeCode Available | 0 |
| Cross-lingual sentiment classification in low-resource Bengali language | Nov 1, 2020 | BenchmarkingClassification | CodeCode Available | 0 |
| Benchmarking Feature Upsampling Methods for Vision Foundation Models using Interactive Segmentation | May 4, 2025 | BenchmarkingFeature Upsampling | CodeCode Available | 0 |
| STREETS: A Novel Camera Network Dataset for Traffic Flow | Dec 1, 2019 | Benchmarking | CodeCode Available | 0 |
| Benchmarking Feature-based Algorithm Selection Systems for Black-box Numerical Optimization | Sep 17, 2021 | Benchmarking | CodeCode Available | 0 |
| Cross-Lingual Auto Evaluation for Assessing Multilingual LLMs | Oct 17, 2024 | Benchmarking | CodeCode Available | 0 |
| Benchmarking Failures in Tool-Augmented Language Models | Mar 18, 2025 | BenchmarkingText Generation | CodeCode Available | 0 |
| CRNN: A Joint Neural Network for Redundancy Detection | Jun 4, 2017 | BenchmarkingGeneral Classification | CodeCode Available | 0 |
| Critical review of conformational B-cell epitope prediction methods | Jan 10, 2023 | BenchmarkingDrug Design | CodeCode Available | 0 |
| PICO Element Detection in Medical Text via Long Short-Term Memory Neural Networks | Jul 1, 2018 | BenchmarkingDecision Making | CodeCode Available | 0 |
| Stronger Than You Think: Benchmarking Weak Supervision on Realistic Tasks | Jan 13, 2025 | Benchmarking | CodeCode Available | 0 |
| CriSp: Leveraging Tread Depth Maps for Enhanced Crime-Scene Shoeprint Matching | Apr 25, 2024 | BenchmarkingData Augmentation | CodeCode Available | 0 |
| PINT: Physics-Informed Neural Time Series Models with Applications to Long-term Inference on WeatherBench 2m-Temperature Data | Feb 6, 2025 | BenchmarkingTime Series | CodeCode Available | 0 |
| An Optical Control Environment for Benchmarking Reinforcement Learning Algorithms | Mar 23, 2022 | BenchmarkingDeep Reinforcement Learning | CodeCode Available | 0 |
| STRUCTSENSE: A Task-Agnostic Agentic Framework for Structured Information Extraction with Human-In-The-Loop Evaluation and Benchmarking | Jul 4, 2025 | BenchmarkingNavigate | CodeCode Available | 0 |
| An open unified deep graph learning framework for discovering drug leads | Dec 6, 2022 | BenchmarkingDrug Discovery | CodeCode Available | 0 |
| PixelBrax: Learning Continuous Control from Pixels End-to-End on the GPU | Jan 16, 2025 | Benchmarkingcontinuous-control | CodeCode Available | 0 |