| Classical ensemble of Quantum-classical ML algorithms for Phishing detection in Ethereum transaction networks | Oct 30, 2022 | Anomaly DetectionBenchmarking | CodeCode Available | 0 |
| CLAIMCHECK: How Grounded are LLM Critiques of Scientific Papers? | Mar 27, 2025 | BenchmarkingSpecificity | CodeCode Available | 0 |
| TruthEval: A Dataset to Evaluate LLM Truthfulness and Reliability | Jun 4, 2024 | BenchmarkingLanguage Modeling | CodeCode Available | 0 |
| Technical Report on the CleverHans v2.1.0 Adversarial Examples Library | Oct 3, 2016 | Adversarial AttackAdversarial Defense | CodeCode Available | 0 |
| A Neuro-Symbolic Framework for Sequence Classification with Relational and Temporal Knowledge | May 8, 2025 | Benchmarking | CodeCode Available | 0 |
| A Neuromorphic Dataset for Object Segmentation in Indoor Cluttered Environment | Feb 13, 2023 | BenchmarkingSegmentation | CodeCode Available | 0 |
| Cityscape-Adverse: Benchmarking Robustness of Semantic Segmentation with Realistic Scene Modifications via Diffusion-Based Image Editing | Nov 1, 2024 | BenchmarkingSemantic Segmentation | CodeCode Available | 0 |
| TSPP: A Unified Benchmarking Tool for Time-series Forecasting | Dec 28, 2023 | BenchmarkingFeature Engineering | CodeCode Available | 0 |
| City-Scale Road Audit System using Deep Learning | Nov 26, 2018 | BenchmarkingDeep Learning | CodeCode Available | 0 |
| Radio Galaxy Zoo: Using semi-supervised learning to leverage large unlabelled data-sets for radio galaxy classification under data-set shift | Apr 19, 2022 | BenchmarkingClassification | CodeCode Available | 0 |
| Advancing and Benchmarking Personalized Tool Invocation for LLMs | May 7, 2025 | BenchmarkingWorld Knowledge | CodeCode Available | 0 |
| CityNet: A Comprehensive Multi-Modal Urban Dataset for Advanced Research in Urban Computing | Jun 30, 2021 | BenchmarkingTransfer Learning | CodeCode Available | 0 |
| Chumor 2.0: Towards Benchmarking Chinese Humor Understanding | Dec 23, 2024 | Benchmarking | CodeCode Available | 0 |
| Benchmarking and Enhancing LLM Agents in Localizing Linux Kernel Bugs | May 26, 2025 | BenchmarkingFault localization | CodeCode Available | 0 |
| Benchmarking and Confidence Evaluation of LALMs For Temporal Reasoning | May 19, 2025 | Benchmarking | CodeCode Available | 0 |
| Randomized Benchmarking of Local Zeroth-Order Optimizers for Variational Quantum Systems | Oct 14, 2023 | Benchmarking | CodeCode Available | 0 |
| Random Machines: A bagged-weighted support vector model with free kernel choice | Nov 21, 2019 | Benchmarkingregression | CodeCode Available | 0 |
| TUBench: Benchmarking Large Vision-Language Models on Trustworthiness with Unanswerable Questions | Oct 5, 2024 | BenchmarkingHallucination | CodeCode Available | 0 |
| ChemSafetyBench: Benchmarking LLM Safety on Chemistry Domain | Nov 23, 2024 | BenchmarkingDiversity | CodeCode Available | 0 |
| Ranking and benchmarking framework for sampling algorithms on synthetic data streams | Jun 17, 2020 | BenchmarkingHyperparameter Optimization | CodeCode Available | 0 |
| QeMFi: A Multifidelity Dataset of Quantum Chemical Properties of Diverse Molecules | Jun 20, 2024 | Benchmarking | CodeCode Available | 0 |
| Tunability: Importance of Hyperparameters of Machine Learning Algorithms | Feb 26, 2018 | BenchmarkingBIG-bench Machine Learning | CodeCode Available | 0 |
| Temporal receptive field in dynamic graph learning: A comprehensive analysis | Jul 17, 2024 | BenchmarkingDynamic Link Prediction | CodeCode Available | 0 |
| A Neural-embedded Choice Model: TasteNet-MNL Modeling Taste Heterogeneity with Flexibility and Interpretability | Feb 3, 2020 | BenchmarkingDiscrete Choice Models | CodeCode Available | 0 |
| Benchmarking Multi-dimensional AIGC Video Quality Assessment: A Dataset and Unified Model | Jul 31, 2024 | BenchmarkingLarge Language Model | CodeCode Available | 0 |