| Efficient and Accurate Optimal Transport with Mirror Descent and Conjugate Gradients | Jul 17, 2023 | BenchmarkingGPU | CodeCode Available | 0 |
| SimbaML: Connecting Mechanistic Models and Machine Learning with Augmented Data | Apr 8, 2023 | BenchmarkingData Augmentation | CodeCode Available | 0 |
| NSINA: A News Corpus for Sinhala | Mar 25, 2024 | ArticlesBenchmarking | CodeCode Available | 0 |
| Improving Sequential Recommendation Models with an Enhanced Loss Function | Jan 3, 2023 | BenchmarkingRecommendation Systems | CodeCode Available | 0 |
| Aspect-based Sentiment Classification with Aspect-specific Graph Convolutional Networks | Sep 8, 2019 | BenchmarkingClassification | CodeCode Available | 0 |
| Editing Factual Knowledge and Explanatory Ability of Medical Large Language Models | Feb 28, 2024 | BenchmarkingHallucination | CodeCode Available | 0 |
| SimBench: A Rule-Based Multi-Turn Interaction Benchmark for Evaluating an LLM's Ability to Generate Digital Twins | Aug 21, 2024 | Benchmarking | CodeCode Available | 0 |
| A Seq2Seq approach to Symbolic Regression | Oct 17, 2020 | Benchmarkingregression | CodeCode Available | 0 |
| A Collection of Quality Diversity Optimization Problems Derived from Hyperparameter Optimization of Machine Learning Models | Apr 28, 2022 | BenchmarkingDiversity | CodeCode Available | 0 |
| Simitate: A Hybrid Imitation Learning Benchmark | May 15, 2019 | BenchmarkingImitation Learning | CodeCode Available | 0 |
| Echo State Networks with Self-Normalizing Activations on the Hyper-Sphere | Mar 27, 2019 | Benchmarking | CodeCode Available | 0 |
| ECBD: Evidence-Centered Benchmark Design for NLP | Jun 13, 2024 | Benchmarking | CodeCode Available | 0 |
| A Continuous Optimisation Benchmark Suite from Neural Network Regression | Sep 12, 2021 | BenchmarkingEvolutionary Algorithms | CodeCode Available | 0 |
| An Evaluation of Machine Learning Approaches for Early Diagnosis of Autism Spectrum Disorder | Sep 20, 2023 | BenchmarkingClustering | CodeCode Available | 0 |
| Dyport: Dynamic Importance-based Hypothesis Generation Benchmarking Technique | Dec 6, 2023 | BenchmarkingKnowledge Graphs | CodeCode Available | 0 |
| DynCIM: Dynamic Curriculum for Imbalanced Multimodal Learning | Mar 9, 2025 | BenchmarkingDecision Making | CodeCode Available | 0 |
| DynamoRep: Trajectory-Based Population Dynamics for Classification of Black-box Optimization Problems | Jun 8, 2023 | BenchmarkingDescriptive | CodeCode Available | 0 |
| Simple GNNs with Low Rank Non-parametric Aggregators | Oct 8, 2023 | BenchmarkingNode Classification | CodeCode Available | 0 |
| Effective Stabilized Self-Training on Few-Labeled Graph Data | Oct 7, 2019 | BenchmarkingModel Selection | CodeCode Available | 0 |
| Simulated Contextual Bandits for Personalization Tasks from Recommendation Datasets | Oct 12, 2022 | BenchmarkingMulti-Armed Bandits | CodeCode Available | 0 |
| A Deep Reinforcement Learning Framework for Dynamic Portfolio Optimization: Evidence from China's Stock Market | Dec 24, 2024 | BenchmarkingDecision Making | CodeCode Available | 0 |
| DyKgChat: Benchmarking Dialogue Generation Grounding on Dynamic Knowledge Graphs | Oct 1, 2019 | BenchmarkingDialogue Generation | CodeCode Available | 0 |
| DVFL-Net: A Lightweight Distilled Video Focal Modulation Network for Spatio-Temporal Action Recognition | Jul 16, 2025 | BenchmarkingKnowledge Distillation | CodeCode Available | 0 |
| Referenced Thermodynamic Integration for Bayesian Model Selection: Application to COVID-19 Model Selection | Sep 8, 2020 | BenchmarkingEpidemiology | CodeCode Available | 0 |
| Simulation-based Benchmarking for Causal Structure Learning in Gene Perturbation Experiments | Jul 8, 2024 | BenchmarkingDecision Making | CodeCode Available | 0 |
| Ducho meets Elliot: Large-scale Benchmarks for Multimodal Recommendation | Sep 24, 2024 | BenchmarkingMovie Recommendation | CodeCode Available | 0 |
| OG-SPACE: Optimized Stochastic Simulation of Spatial Models of Cancer Evolution | Oct 13, 2021 | Benchmarking | CodeCode Available | 0 |
| Simulation-Based Benchmarking of Reinforcement Learning Agents for Personalized Retail Promotions | May 16, 2024 | BenchmarkingReinforcement Learning (RL) | CodeCode Available | 0 |
| Okapi: Generalising Better by Making Statistical Matches Match | Nov 7, 2022 | BenchmarkingBinary Classification | CodeCode Available | 0 |
| DrugOOD: Out-of-Distribution (OOD) Dataset Curator and Benchmark for AI-aided Drug Discovery -- A Focus on Affinity Prediction Problems with Noise Annotations | Jan 24, 2022 | BenchmarkingDrug Discovery | CodeCode Available | 0 |
| DQI: Measuring Data Quality in NLP | May 2, 2020 | Active LearningBenchmarking | CodeCode Available | 0 |
| ToPro: Token-Level Prompt Decomposition for Cross-Lingual Sequence Labeling Tasks | Jan 29, 2024 | BenchmarkingCross-Lingual Transfer | CodeCode Available | 0 |
| Domain-Expanded ASTE: Rethinking Generalization in Aspect Sentiment Triplet Extraction | May 23, 2023 | Aspect-Based Sentiment AnalysisAspect-Based Sentiment Analysis (ABSA) | CodeCode Available | 0 |
| WebSuite: Systematically Evaluating Why Web Agents Fail | Jun 1, 2024 | BenchmarkingDiagnostic | CodeCode Available | 0 |
| Domain2Vec: Domain Embedding for Unsupervised Domain Adaptation | Jul 17, 2020 | BenchmarkingDisentanglement | CodeCode Available | 0 |
| Benchmarking Machine Learning Robustness in Covid-19 Genome Sequence Classification | Jul 18, 2022 | BenchmarkingBIG-bench Machine Learning | CodeCode Available | 0 |
| Do Localization Methods Actually Localize Memorized Data in LLMs? A Tale of Two Benchmarks | Nov 15, 2023 | BenchmarkingNetwork Pruning | CodeCode Available | 0 |
| Do LLMs Memorize Recommendation Datasets? A Preliminary Study on MovieLens-1M | May 15, 2025 | BenchmarkingMemorization | CodeCode Available | 0 |
| Are You Getting What You Pay For? Auditing Model Substitution in LLM APIs | Apr 7, 2025 | BenchmarkingFairness | CodeCode Available | 0 |
| A Review of Testing Object-Based Environment Perception for Safe Automated Driving | Feb 16, 2021 | BenchmarkingSensor Modeling | CodeCode Available | 0 |
| Single and Multi-Hop Question-Answering Datasets for Reticular Chemistry with GPT-4-Turbo | May 3, 2024 | BenchmarkingMulti-hop Question Answering | CodeCode Available | 0 |
| Benchmarking machine learning for bowel sound pattern classification from tabular features to pretrained models | Feb 21, 2025 | BenchmarkingDiagnostic | CodeCode Available | 0 |
| On dataset transferability in medical image classification | Dec 28, 2024 | BenchmarkingClassification | CodeCode Available | 0 |
| Are Synthetic Corruptions A Reliable Proxy For Real-World Corruptions? | May 7, 2025 | BenchmarkingSemantic Segmentation | CodeCode Available | 0 |
| Do LLM Evaluators Prefer Themselves for a Reason? | Apr 4, 2025 | BenchmarkingCode Generation | CodeCode Available | 0 |
| YOLOBench: Benchmarking Efficient Object Detectors on Embedded Systems | Jul 26, 2023 | BenchmarkingCPU | CodeCode Available | 0 |
| Benchmarking Long-tail Generalization with Likelihood Splits | Oct 13, 2022 | BenchmarkingLanguage Modeling | CodeCode Available | 0 |
| UrduFactCheck: An Agentic Fact-Checking Framework for Urdu with Evidence Boosting and Benchmarking | May 21, 2025 | BenchmarkingClaim Verification | CodeCode Available | 0 |
| On Empirical Comparisons of Optimizers for Deep Learning | Oct 11, 2019 | BenchmarkingDeep Learning | CodeCode Available | 0 |
| Benchmarking LLMs' Judgments with No Gold Standard | Nov 11, 2024 | BenchmarkingMachine Translation | CodeCode Available | 0 |