| Mol-MoE: Training Preference-Guided Routers for Molecule Generation | Feb 8, 2025 | BenchmarkingDrug Design | CodeCode Available | 0 |
| Benchmarking Robust Self-Supervised Learning Across Diverse Downstream Tasks | Jul 17, 2024 | Adversarial RobustnessBenchmarking | CodeCode Available | 0 |
| Fine-grained Hand Gesture Recognition in Multi-viewpoint Hand Hygiene | Sep 7, 2021 | BenchmarkingFine-Grained Image Recognition | CodeCode Available | 0 |
| Moment Matching for Multi-Source Domain Adaptation | Dec 4, 2018 | BenchmarkingDomain Adaptation | CodeCode Available | 0 |
| Benchmarking Robustness to Text-Guided Corruptions | Apr 6, 2023 | BenchmarkingData Augmentation | CodeCode Available | 0 |
| Fine-grained Entity Recognition with Reduced False Negatives and Large Type Coverage | Apr 30, 2019 | Benchmarking | CodeCode Available | 0 |
| Finding the Perfect Fit: Applying Regression Models to ClimateBench v1.0 | Aug 23, 2023 | Benchmarkingregression | CodeCode Available | 0 |
| Benchmarking Robustness of Endoscopic Depth Estimation with Synthetically Corrupted Data | Sep 24, 2024 | BenchmarkingDepth Estimation | CodeCode Available | 0 |
| Benchmarking Robustness of 3D Object Detection to Common Corruptions in Autonomous Driving | Mar 20, 2023 | 3D Object DetectionAutonomous Driving | CodeCode Available | 0 |
| Scission: Performance-driven and Context-aware Cloud-Edge Distribution of Deep Neural Networks | Aug 8, 2020 | BenchmarkingDecision Making | CodeCode Available | 0 |
| ALDI++: Automatic and parameter-less discord and outlier detection for building energy load profiles | Mar 13, 2022 | BenchmarkingBIG-bench Machine Learning | CodeCode Available | 0 |
| Benchmarking Robustness in Object Detection: Autonomous Driving when Winter is Coming | Jul 17, 2019 | Autonomous DrivingBenchmarking | CodeCode Available | 0 |
| Motley: Benchmarking Heterogeneity and Personalization in Federated Learning | Jun 18, 2022 | BenchmarkingFairness | CodeCode Available | 0 |
| ScoNe: Benchmarking Negation Reasoning in Language Models With Fine-Tuning and In-Context Learning | May 30, 2023 | BenchmarkingIn-Context Learning | CodeCode Available | 0 |
| Benchmarking Retinal Blood Vessel Segmentation Models for Cross-Dataset and Cross-Disease Generalization | Jun 21, 2024 | BenchmarkingSegmentation | CodeCode Available | 0 |
| The Role of Model Architecture and Scale in Predicting Molecular Properties: Insights from Fine-Tuning RoBERTa, BART, and LLaMA | May 2, 2024 | BenchmarkingDrug Discovery | CodeCode Available | 0 |
| AutoJudger: An Agent-Driven Framework for Efficient Benchmarking of MLLMs | May 27, 2025 | BenchmarkingQuestion Selection | CodeCode Available | 0 |
| Benchmarking Representation Learning for Natural World Image Collections | Mar 30, 2021 | BenchmarkingBinary Classification | CodeCode Available | 0 |
| Benchmarking Reinforcement Learning Algorithms on Real-World Robots | Sep 20, 2018 | Benchmarkingcontinuous-control | CodeCode Available | 0 |
| Benchmarking Quantum Reinforcement Learning | Jan 27, 2025 | Benchmarkingreinforcement-learning | CodeCode Available | 0 |
| MSAMSum: Towards Benchmarking Multi-lingual Dialogue Summarization | May 1, 2022 | Benchmarkingdialogue summary | CodeCode Available | 0 |
| Alchemy: A Quantum Chemistry Dataset for Benchmarking AI Models | Jun 22, 2019 | BenchmarkingBIG-bench Machine Learning | CodeCode Available | 0 |
| FHBench: Towards Efficient and Personalized Federated Learning for Multimodal Healthcare | Apr 15, 2025 | BenchmarkingDiagnostic | CodeCode Available | 0 |
| Benchmarking quantum machine learning kernel training for classification tasks | Aug 17, 2024 | BenchmarkingQuantum Machine Learning | CodeCode Available | 0 |
| The Saudi Privacy Policy Dataset | Apr 5, 2023 | Benchmarking | CodeCode Available | 0 |
| MST: Adaptive Multi-Scale Tokens Guided Interactive Segmentation | Jan 9, 2024 | BenchmarkingInteractive Segmentation | CodeCode Available | 0 |
| ferret: a Framework for Benchmarking Explainers on Transformers | Aug 2, 2022 | BenchmarkingExplainable Artificial Intelligence (XAI) | CodeCode Available | 0 |
| Benchmarking Procedural Language Understanding for Low-Resource Languages: A Case Study on Turkish | Sep 13, 2023 | BenchmarkingTranslation | CodeCode Available | 0 |
| FEET: A Framework for Evaluating Embedding Techniques | Nov 2, 2024 | BenchmarkingRepresentation Learning | CodeCode Available | 0 |
| Benchmarking Probabilistic Deep Learning Methods for License Plate Recognition | Feb 2, 2023 | BenchmarkingDeep Learning | CodeCode Available | 0 |
| Unraveling the Capabilities of Language Models in News Summarization | Jan 30, 2025 | BenchmarkingFew-Shot Learning | CodeCode Available | 0 |
| mTSBench: Benchmarking Multivariate Time Series Anomaly Detection and Model Selection at Scale | Jun 26, 2025 | Anomaly DetectionBenchmarking | CodeCode Available | 0 |
| FedNLP: Benchmarking Federated Learning Methods for Natural Language Processing Tasks | Apr 18, 2021 | BenchmarkingFederated Learning | CodeCode Available | 0 |
| MUBen: Benchmarking the Uncertainty of Molecular Representation Models | Jun 14, 2023 | BenchmarkingDrug Discovery | CodeCode Available | 0 |
| The Sounds of Home: A Speech-Removed Residential Audio Dataset for Sound Event Detection | Sep 17, 2024 | BenchmarkingEvent Detection | CodeCode Available | 0 |
| WAC: A Corpus of Wikipedia Conversations for Online Abuse Detection | Mar 13, 2020 | Abuse DetectionBenchmarking | CodeCode Available | 0 |
| FedSecurity: Benchmarking Attacks and Defenses in Federated Learning and Federated LLMs | Jun 8, 2023 | BenchmarkingFederated Learning | CodeCode Available | 0 |
| Fedivertex: a Graph Dataset based on Decentralized Social Networks for Trustworthy Machine Learning | May 27, 2025 | Benchmarking | CodeCode Available | 0 |
| Feature interpretability in BCIs: exploring the role of network lateralization | Jul 16, 2024 | BenchmarkingEEG | CodeCode Available | 0 |
| AutoBench-V: Can Large Vision-Language Models Benchmark Themselves? | Oct 28, 2024 | BenchmarkingQuestion Answering | CodeCode Available | 0 |
| Benchmarking pre-trained text embedding models in aligning built asset information | Nov 18, 2024 | Asset ManagementBenchmarking | CodeCode Available | 0 |
| Benchmarking Pre-trained Language Models for Multilingual NER: TraSpaS at the BSNLP2021 Shared Task | Apr 1, 2021 | BenchmarkingLanguage Modeling | CodeCode Available | 0 |
| Feature embedding in click-through rate prediction | Sep 20, 2022 | BenchmarkingClick-Through Rate Prediction | CodeCode Available | 0 |
| Acoustic Identification of Ae. aegypti Mosquitoes using Smartphone Apps and Residual Convolutional Neural Networks | Jun 16, 2023 | Benchmarking | CodeCode Available | 0 |
| FB-Bench: A Fine-Grained Multi-Task Benchmark for Evaluating LLMs' Responsiveness to Human Feedback | Oct 12, 2024 | Benchmarking | CodeCode Available | 0 |
| Benchmarking Post-Training Quantization in LLMs: Comprehensive Taxonomy, Unified Evaluation, and Comparative Analysis | Feb 18, 2025 | BenchmarkingMamba | CodeCode Available | 0 |
| Multi-EuP: The Multilingual European Parliament Dataset for Analysis of Bias in Information Retrieval | Nov 3, 2023 | BenchmarkingFairness | CodeCode Available | 0 |
| AuthNet: A Deep Learning based Authentication Mechanism using Temporal Facial Feature Movements | Dec 4, 2020 | BenchmarkingLip password classification | CodeCode Available | 0 |
| Yesterday's News: Benchmarking Multi-Dimensional Out-of-Distribution Generalisation of Misinformation Detection Models | Oct 12, 2024 | BenchmarkingMisinformation | CodeCode Available | 0 |
| FastTextSpotter: A High-Efficiency Transformer for Multilingual Scene Text Spotting | Aug 27, 2024 | BenchmarkingDecoder | CodeCode Available | 0 |