| Large-scale Ridesharing DARP Instances Based on Real Travel Demand | May 30, 2023 | Benchmarking | CodeCode Available | 0 |
| Calibrating Pre-trained Language Classifiers on LLM-generated Noisy Labels via Iterative Refinement | May 26, 2025 | Benchmarking | CodeCode Available | 0 |
| JExplore: Design Space Exploration Tool for Nvidia Jetson Boards | Feb 16, 2025 | BenchmarkingGPU | CodeCode Available | 0 |
| Anchor Points: Benchmarking Models with Much Fewer Examples | Sep 14, 2023 | BenchmarkingLanguage Modeling | CodeCode Available | 0 |
| Laughing Heads: Can Transformers Detect What Makes a Sentence Funny? | May 19, 2021 | BenchmarkingSentence | CodeCode Available | 0 |
| THaMES: An End-to-End Tool for Hallucination Mitigation and Evaluation in Large Language Models | Sep 17, 2024 | BenchmarkingBinary Classification | CodeCode Available | 0 |
| JATE 2.0: Java Automatic Term Extraction with Apache Solr | May 1, 2016 | BenchmarkingTerm Extraction | CodeCode Available | 0 |
| JALMBench: Benchmarking Jailbreak Vulnerabilities in Audio Language Models | May 23, 2025 | BenchmarkingDiversity | CodeCode Available | 0 |
| Calibrated Adaptive Probabilistic ODE Solvers | Dec 15, 2020 | BenchmarkingDescriptive | CodeCode Available | 0 |
| Is Your Model Fairly Certain? Uncertainty-Aware Fairness Evaluation for LLMs | May 29, 2025 | BenchmarkingFairness | CodeCode Available | 0 |
| Reinforcement Learning to Disentangle Multiqubit Quantum States from Partial Observations | Jun 12, 2024 | BenchmarkingDeep Reinforcement Learning | CodeCode Available | 0 |
| DyKnow: Dynamically Verifying Time-Sensitive Factual Knowledge in LLMs | Apr 10, 2024 | Benchmarkingknowledge editing | CodeCode Available | 0 |
| AdvancedHMC.jl: A robust, modular and efficient implementation of advanced HMC algorithms | Oct 16, 2019 | Bayesian InferenceBenchmarking | CodeCode Available | 0 |
| An Auditing Test To Detect Behavioral Shift in Language Models | Oct 25, 2024 | BenchmarkingChange Detection | CodeCode Available | 0 |
| Leak Proof CMap; a framework for training and evaluation of cell line agnostic L1000 similarity methods | Apr 29, 2024 | BenchmarkingDrug Discovery | CodeCode Available | 0 |
| Learnability and Complexity of Quantum Samples | Oct 22, 2020 | Benchmarking | CodeCode Available | 0 |
| Learned Bayesian Cramér-Rao Bound for Unknown Measurement Models Using Score Neural Networks | Feb 2, 2025 | Benchmarking | CodeCode Available | 0 |
| Learned Sorted Table Search and Static Indexes in Small Model Space | Jul 19, 2021 | BenchmarkingOpen-Ended Question Answering | CodeCode Available | 0 |
| Learn How to Query from Unlabeled Data Streams in Federated Learning | Dec 11, 2024 | BenchmarkingDecision Making | CodeCode Available | 0 |
| Reinvestigating the R2 Indicator: Achieving Pareto Compliance by Integration | Jul 1, 2024 | Benchmarking | CodeCode Available | 0 |
| Learning Adaptive Discriminative Correlation Filters via Temporal Consistency Preserving Spatial Feature Selection for Robust Visual Tracking | Jul 30, 2018 | Benchmarkingfeature selection | CodeCode Available | 0 |
| Learning an Event Sequence Embedding for Dense Event-Based Deep Stereo | Oct 1, 2019 | Benchmarking | CodeCode Available | 0 |
| Adjusting Pretrained Backbones for Performativity | Oct 6, 2024 | BenchmarkingDeep Learning | CodeCode Available | 0 |
| Cable Tree Wiring -- Benchmarking Solvers on a Real-World Scheduling Problem with a Variety of Precedence Constraints | Nov 25, 2020 | BenchmarkingScheduling | CodeCode Available | 0 |
| Benchmarking Deep Learning Architectures for Predicting Readmission to the ICU and Describing Patients-at-Risk | May 21, 2019 | Bayesian InferenceBenchmarking | CodeCode Available | 0 |
| REMM:Rotation-Equivariant Framework for End-to-End Multimodal Image Matching | Jul 16, 2024 | Benchmarking | CodeCode Available | 0 |
| Learning collective multi-cellular dynamics from temporal scRNA-seq via a transformer-enhanced Neural SDE | May 22, 2025 | BenchmarkingTime Series | CodeCode Available | 0 |
| Using representation balancing to learn conditional-average dose responses from clustered data | Sep 7, 2023 | BenchmarkingCausal Inference | CodeCode Available | 0 |
| Beemo: Benchmark of Expert-edited Machine-generated Outputs | Nov 6, 2024 | Benchmarking | CodeCode Available | 0 |
| B-XAIC Dataset: Benchmarking Explainable AI for Graph Neural Networks Using Chemical Data | May 28, 2025 | BenchmarkingDrug Discovery | CodeCode Available | 0 |
| Building Conformal Prediction Intervals with Approximate Message Passing | Oct 21, 2024 | BenchmarkingConformal Prediction | CodeCode Available | 0 |
| Learning Dynamic Selection and Pricing of Out-of-Home Deliveries | Nov 23, 2023 | BenchmarkingDecision Making | CodeCode Available | 0 |
| UAV Trajectory Planning for Data Collection from Time-Constrained IoT Devices | Sep 17, 2019 | BenchmarkingTrajectory Planning | CodeCode Available | 0 |
| Learning from Integral Losses in Physics Informed Neural Networks | May 27, 2023 | Benchmarking | CodeCode Available | 0 |
| Removing Geometric Bias in One-Class Anomaly Detection with Adaptive Feature Perturbation | Mar 7, 2025 | Anomaly DetectionBenchmarking | CodeCode Available | 0 |
| The Arcade Learning Environment: An Evaluation Platform for General Agents | Jul 19, 2012 | Atari GamesBenchmarking | CodeCode Available | 0 |
| Building and benchmarking an Arabic Speech Commands dataset for small-footprint keyword spotting | May 7, 2021 | BenchmarkingDeep Learning | CodeCode Available | 0 |
| Learning protein constitutive motifs from sequence data | Mar 23, 2018 | BenchmarkingSpecificity | CodeCode Available | 0 |
| Learning Quantum Processes with Quantum Statistical Queries | Oct 3, 2023 | BenchmarkingCryptanalysis | CodeCode Available | 0 |
| ISImed: A Framework for Self-Supervised Learning using Intrinsic Spatial Information in Medical Images | Oct 22, 2024 | BenchmarkingSelf-Supervised Learning | CodeCode Available | 0 |
| UBENCH: Benchmarking Uncertainty in Large Language Models with Multiple Choice Questions | Jun 18, 2024 | BenchmarkingMultiple-choice | CodeCode Available | 0 |
| BED: Bi-Encoder-Based Detectors for Out-of-Distribution Detection | Jun 15, 2023 | BenchmarkingOut-of-Distribution Detection | CodeCode Available | 0 |
| Replicable Benchmarking of Neural Machine Translation (NMT) on Low-Resource Local Languages in Indonesia | Nov 2, 2023 | BenchmarkingMachine Translation | CodeCode Available | 0 |
| RUHSNet: 3D Object Detection Using Lidar Data in Real Time | May 9, 2020 | 3D Object DetectionAutonomous Vehicles | CodeCode Available | 0 |
| Replication Study and Benchmarking of Real-Time Object Detection Models | May 11, 2024 | Benchmarkingobject-detection | CodeCode Available | 0 |
| IPC: A Benchmark Data Set for Learning with Graph-Structured Data | May 15, 2019 | BenchmarkingGraph Classification | CodeCode Available | 0 |
| RepLiQA: A Question-Answering Dataset for Benchmarking LLMs on Unseen Reference Content | Jun 17, 2024 | BenchmarkingGeneral Knowledge | CodeCode Available | 0 |
| Building a Large Scale Dataset for Image Emotion Recognition: The Fine Print and The Benchmark | May 9, 2016 | BenchmarkingEmotion Recognition | CodeCode Available | 0 |
| IoT Data Trust Evaluation via Machine Learning | Aug 15, 2023 | BenchmarkingTime Series | CodeCode Available | 0 |
| Representation Learning of Limit Order Book: A Comprehensive Study and Benchmarking | May 4, 2025 | BenchmarkingRepresentation Learning | CodeCode Available | 0 |