| Benchmarking of Query Strategies: Towards Future Deep Active Learning | Dec 10, 2023 | Active LearningBenchmarking | CodeCode Available | 0 |
| Semi-Supervised Learning for Anomaly Traffic Detection via Bidirectional Normalizing Flows | Mar 13, 2024 | Anomaly DetectionBenchmarking | CodeCode Available | 0 |
| A Context-Aware Citation Recommendation Model with BERT and Graph Convolutional Networks | Mar 15, 2019 | BenchmarkingCitation Recommendation | CodeCode Available | 0 |
| Named Clinical Entity Recognition Benchmark | Oct 7, 2024 | BenchmarkingDecoder | CodeCode Available | 0 |
| EvalxNLP: A Framework for Benchmarking Post-Hoc Explainability Methods on NLP Models | May 2, 2025 | Benchmarking | CodeCode Available | 0 |
| Evaluating the Transferability of Machine-Learned Force Fields for Material Property Modeling | Jan 10, 2023 | BenchmarkingGraph Neural Network | CodeCode Available | 0 |
| Evaluating the Systematic Reasoning Abilities of Large Language Models through Graph Coloring | Feb 10, 2025 | Benchmarking | CodeCode Available | 0 |
| Evaluating the Robustness of Deep Reinforcement Learning for Autonomous Policies in a Multi-agent Urban Driving Environment | Dec 22, 2021 | Autonomous DrivingBenchmarking | CodeCode Available | 0 |
| Watts: Infrastructure for Open-Ended Learning | Apr 28, 2022 | Benchmarking | CodeCode Available | 0 |
| Evaluating the Ability of LLMs to Solve Semantics-Aware Process Mining Tasks | Jul 2, 2024 | Activity PredictionAnomaly Detection | CodeCode Available | 0 |
| A Thorough Performance Benchmarking on Lightweight Embedding-based Recommender Systems | Jun 25, 2024 | BenchmarkingCollaborative Filtering | CodeCode Available | 0 |
| SemSegBench & DetecBench: Benchmarking Reliability and Generalization Beyond Classification | May 23, 2025 | BenchmarkingClassification | CodeCode Available | 0 |
| Separating form and meaning: Using self-consistency to quantify task understanding across multiple senses | May 19, 2023 | BenchmarkingForm | CodeCode Available | 0 |
| Unsupervised Novelty Detection Methods Benchmarking with Wavelet Decomposition | Sep 11, 2024 | BenchmarkingNovelty Detection | CodeCode Available | 0 |
| Evaluating Shallow and Deep Neural Networks for Network Intrusion Detection Systems in Cyber Security | Oct 8, 2018 | BenchmarkingBIG-bench Machine Learning | CodeCode Available | 0 |
| Transparent and Scrutable Recommendations Using Natural Language User Profiles | Feb 8, 2024 | BenchmarkingDescriptive | CodeCode Available | 0 |
| SenseShift6D: Multimodal RGB-D Benchmarking for Robust 6D Pose Estimation across Environment and Sensor Variations | Jul 8, 2025 | 6D Pose Estimation6D Pose Estimation using RGB | CodeCode Available | 0 |
| SensorBench: Benchmarking LLMs in Coding-Based Sensor Processing | Oct 14, 2024 | BenchmarkingManagement | CodeCode Available | 0 |
| A Comprehensive Summarization and Evaluation of Feature Refinement Modules for CTR Prediction | Nov 8, 2023 | BenchmarkingClick-Through Rate Prediction | CodeCode Available | 0 |
| Navigating Out-of-Distribution Electricity Load Forecasting during COVID-19: Benchmarking energy load forecasting models without and with continual learning | Sep 8, 2023 | BenchmarkingContinual Learning | CodeCode Available | 0 |
| Evaluating SAT and SMT Solvers on Large-Scale Sudoku Puzzles | Jan 15, 2025 | Benchmarking | CodeCode Available | 0 |
| NbBench: Benchmarking Language Models for Comprehensive Nanobody Tasks | May 4, 2025 | BenchmarkingRepresentation Learning | CodeCode Available | 0 |
| NCAdapt: Dynamic adaptation with domain-specific Neural Cellular Automata for continual hippocampus segmentation | Oct 30, 2024 | BenchmarkingContinual Learning | CodeCode Available | 0 |
| A Systematic Review of Green AI | Jan 26, 2023 | Benchmarking | CodeCode Available | 0 |
| Evaluating LLP Methods: Challenges and Approaches | Oct 29, 2023 | BenchmarkingModel Selection | CodeCode Available | 0 |
| Evaluating Feature Attribution Methods in the Image Domain | Feb 22, 2022 | Benchmarking | CodeCode Available | 0 |
| NegBio: a high-performance tool for negation and uncertainty detection in radiology reports | Dec 16, 2017 | BenchmarkingNegation | CodeCode Available | 0 |
| A Comprehensive Comparison of Multi-Dimensional Image Denoising Methods | Nov 6, 2020 | BenchmarkingDenoising | CodeCode Available | 0 |
| NeMig -- A Bilingual News Collection and Knowledge Graph about Migration | Sep 1, 2023 | ArticlesBenchmarking | CodeCode Available | 0 |
| NengoDL: Combining deep learning and neuromorphic modelling methods | May 28, 2018 | BenchmarkingDeep Learning | CodeCode Available | 0 |
| Evaluating AI Recruitment Sourcing Tools by Human Preference | Apr 3, 2025 | Benchmarking | CodeCode Available | 0 |
| EvalAI: Towards Better Evaluation Systems for AI Agents | Feb 10, 2019 | BenchmarkingBIG-bench Machine Learning | CodeCode Available | 0 |
| Essential guidelines for computational method benchmarking | Dec 3, 2018 | Benchmarking | CodeCode Available | 0 |
| Benchmarking of LSTM Networks | Aug 11, 2015 | Benchmarking | CodeCode Available | 0 |
| NerveNet: Learning Structured Policy with Graph Neural Networks | Jan 1, 2018 | Benchmarkingcontinuous-control | CodeCode Available | 0 |
| How Fragile is Relation Extraction under Entity Replacements? | May 22, 2023 | BenchmarkingCausal Inference | CodeCode Available | 0 |
| Benchmarking Network Embedding Models for Link Prediction: Are We Making Progress? | Feb 25, 2020 | BenchmarkingLink Prediction | CodeCode Available | 0 |
| Sequence-Aware Recommender Systems | Feb 23, 2018 | BenchmarkingMatrix Completion | CodeCode Available | 0 |
| WCEbleedGen: A wireless capsule endoscopy dataset and its benchmarking for automatic bleeding classification, detection, and segmentation | Aug 22, 2024 | BenchmarkingClassification | CodeCode Available | 0 |
| Enterprise Benchmarks for Large Language Model Evaluation | Oct 11, 2024 | BenchmarkingLanguage Model Evaluation | CodeCode Available | 0 |
| Enriching Social Science Research via Survey Item Linking | Dec 20, 2024 | BenchmarkingEntity Disambiguation | CodeCode Available | 0 |
| Sequential Large Language Model-Based Hyper-parameter Optimization | Oct 27, 2024 | Bayesian OptimizationBenchmarking | CodeCode Available | 0 |
| Neural Network Design: Learning from Neural Architecture Search | Nov 1, 2020 | Benchmarkingimage-classification | CodeCode Available | 0 |
| Benchmarking of image registration methods for differently stained histological slides | Oct 11, 2018 | BenchmarkingBIRL | CodeCode Available | 0 |
| BOND: Benchmarking Unsupervised Outlier Node Detection on Static Attributed Graphs | Jun 21, 2022 | Anomaly DetectionBenchmarking | CodeCode Available | 0 |
| Enhancing Video Summarization with Context Awareness | Apr 6, 2024 | BenchmarkingInformativeness | CodeCode Available | 0 |
| Enhancing Treatment Effect Estimation via Active Learning: A Counterfactual Covering Perspective | May 8, 2025 | Active LearningBenchmarking | CodeCode Available | 0 |
| Benchmarking Neural Machine Translation for Southern African Languages | Jun 17, 2019 | BenchmarkingMachine Translation | CodeCode Available | 0 |
| Enhancing Hyper-To-Real Space Projections Through Euclidean Norm Meta-Heuristic Optimization | Jan 31, 2023 | Benchmarking | CodeCode Available | 0 |
| Enhancing Biomedical Knowledge Discovery for Diseases: An Open-Source Framework Applied on Rett Syndrome and Alzheimer's Disease | Jul 18, 2024 | Benchmarking | CodeCode Available | 0 |