| Classical ensemble of Quantum-classical ML algorithms for Phishing detection in Ethereum transaction networks | Oct 30, 2022 | Anomaly DetectionBenchmarking | CodeCode Available | 0 |
| CLAIMCHECK: How Grounded are LLM Critiques of Scientific Papers? | Mar 27, 2025 | BenchmarkingSpecificity | CodeCode Available | 0 |
| TruthEval: A Dataset to Evaluate LLM Truthfulness and Reliability | Jun 4, 2024 | BenchmarkingLanguage Modeling | CodeCode Available | 0 |
| Technical Report on the CleverHans v2.1.0 Adversarial Examples Library | Oct 3, 2016 | Adversarial AttackAdversarial Defense | CodeCode Available | 0 |
| A Neuro-Symbolic Framework for Sequence Classification with Relational and Temporal Knowledge | May 8, 2025 | Benchmarking | CodeCode Available | 0 |
| A Neuromorphic Dataset for Object Segmentation in Indoor Cluttered Environment | Feb 13, 2023 | BenchmarkingSegmentation | CodeCode Available | 0 |
| Cityscape-Adverse: Benchmarking Robustness of Semantic Segmentation with Realistic Scene Modifications via Diffusion-Based Image Editing | Nov 1, 2024 | BenchmarkingSemantic Segmentation | CodeCode Available | 0 |
| TSPP: A Unified Benchmarking Tool for Time-series Forecasting | Dec 28, 2023 | BenchmarkingFeature Engineering | CodeCode Available | 0 |
| City-Scale Road Audit System using Deep Learning | Nov 26, 2018 | BenchmarkingDeep Learning | CodeCode Available | 0 |
| Radio Galaxy Zoo: Using semi-supervised learning to leverage large unlabelled data-sets for radio galaxy classification under data-set shift | Apr 19, 2022 | BenchmarkingClassification | CodeCode Available | 0 |
| Advancing and Benchmarking Personalized Tool Invocation for LLMs | May 7, 2025 | BenchmarkingWorld Knowledge | CodeCode Available | 0 |
| CityNet: A Comprehensive Multi-Modal Urban Dataset for Advanced Research in Urban Computing | Jun 30, 2021 | BenchmarkingTransfer Learning | CodeCode Available | 0 |
| Chumor 2.0: Towards Benchmarking Chinese Humor Understanding | Dec 23, 2024 | Benchmarking | CodeCode Available | 0 |
| Benchmarking and Enhancing LLM Agents in Localizing Linux Kernel Bugs | May 26, 2025 | BenchmarkingFault localization | CodeCode Available | 0 |
| Benchmarking and Confidence Evaluation of LALMs For Temporal Reasoning | May 19, 2025 | Benchmarking | CodeCode Available | 0 |
| Randomized Benchmarking of Local Zeroth-Order Optimizers for Variational Quantum Systems | Oct 14, 2023 | Benchmarking | CodeCode Available | 0 |
| Random Machines: A bagged-weighted support vector model with free kernel choice | Nov 21, 2019 | Benchmarkingregression | CodeCode Available | 0 |
| TUBench: Benchmarking Large Vision-Language Models on Trustworthiness with Unanswerable Questions | Oct 5, 2024 | BenchmarkingHallucination | CodeCode Available | 0 |
| ChemSafetyBench: Benchmarking LLM Safety on Chemistry Domain | Nov 23, 2024 | BenchmarkingDiversity | CodeCode Available | 0 |
| Ranking and benchmarking framework for sampling algorithms on synthetic data streams | Jun 17, 2020 | BenchmarkingHyperparameter Optimization | CodeCode Available | 0 |
| QeMFi: A Multifidelity Dataset of Quantum Chemical Properties of Diverse Molecules | Jun 20, 2024 | Benchmarking | CodeCode Available | 0 |
| Tunability: Importance of Hyperparameters of Machine Learning Algorithms | Feb 26, 2018 | BenchmarkingBIG-bench Machine Learning | CodeCode Available | 0 |
| Temporal receptive field in dynamic graph learning: A comprehensive analysis | Jul 17, 2024 | BenchmarkingDynamic Link Prediction | CodeCode Available | 0 |
| A Neural-embedded Choice Model: TasteNet-MNL Modeling Taste Heterogeneity with Flexibility and Interpretability | Feb 3, 2020 | BenchmarkingDiscrete Choice Models | CodeCode Available | 0 |
| Benchmarking Multi-dimensional AIGC Video Quality Assessment: A Dataset and Unified Model | Jul 31, 2024 | BenchmarkingLarge Language Model | CodeCode Available | 0 |
| ChatGPT for GTFS: Benchmarking LLMs on GTFS Understanding and Retrieval | Aug 4, 2023 | BenchmarkingInformation Retrieval | CodeCode Available | 0 |
| RCP-Bench: Benchmarking Robustness for Collaborative Perception Under Diverse Corruptions | Jan 1, 2025 | Benchmarking | CodeCode Available | 0 |
| TuneVLSeg: Prompt Tuning Benchmark for Vision-Language Segmentation Models | Oct 7, 2024 | BenchmarkingSegmentation | CodeCode Available | 0 |
| RDF-star2Vec: RDF-star Graph Embeddings for Data Mining | Dec 25, 2023 | BenchmarkingGraph Embedding | CodeCode Available | 0 |
| 4D-Bench: Benchmarking Multi-modal Large Language Models for 4D Object Understanding | Mar 22, 2025 | BenchmarkingObject | CodeCode Available | 0 |
| An Empirical Evaluation of Cost-based Federated SPARQL Query Processing Engines | Apr 2, 2021 | Benchmarking | CodeCode Available | 0 |
| Characterizing SLAM Benchmarks and Methods for the Robust Perception Age | May 19, 2019 | Benchmarking | CodeCode Available | 0 |
| Benchmarking Adversarial Robustness to Bias Elicitation in Large Language Models: Scalable Automated Assessment with LLM-as-a-Judge | Apr 10, 2025 | Adversarial RobustnessBenchmarking | CodeCode Available | 0 |
| Characterizing Bias: Benchmarking Large Language Models in Simplified versus Traditional Chinese | May 28, 2025 | Benchmarking | CodeCode Available | 0 |
| Changepoint Detection in Noisy Data Using a Novel Residuals Permutation-Based Method (RESPERM): Benchmarking and Application to Single Trial ERPs | Apr 21, 2022 | BenchmarkingChange Point Detection | CodeCode Available | 0 |
| TuringQ: Benchmarking AI Comprehension in Theory of Computation | Oct 9, 2024 | Benchmarking | CodeCode Available | 0 |
| An empirical comparison between stochastic and deterministic centroid initialisation for K-Means variations | Aug 26, 2019 | BenchmarkingClustering | CodeCode Available | 0 |
| TweetNERD -- End to End Entity Linking Benchmark for Tweets | Oct 14, 2022 | BenchmarkingEntity Linking | CodeCode Available | 0 |
| Real-time cryo-EM data pre-processing with Warp | Jun 14, 2018 | BenchmarkingImage Reconstruction | CodeCode Available | 0 |
| Towards Learning Universal, Regional, and Local Hydrological Behaviors via Machine-Learning Applied to Large-Sample Datasets | Jul 19, 2019 | BenchmarkingBIG-bench Machine Learning | CodeCode Available | 0 |
| TextClass Benchmark: A Continuous Elo Rating of LLMs in Social Sciences | Nov 30, 2024 | BenchmarkingClassification | CodeCode Available | 0 |
| Certifiable Black-Box Attacks with Randomized Adversarial Examples: Breaking Defenses with Provable Confidence | Apr 10, 2023 | Benchmarkingspeech-recognition | CodeCode Available | 0 |
| Benchmarking Abstract and Reasoning Abilities Through A Theoretical Perspective | May 28, 2025 | BenchmarkingMemorization | CodeCode Available | 0 |
| An Efficient Two-stage Gradient Boosting Framework for Short-term Traffic State Estimation | Feb 21, 2023 | BenchmarkingState Estimation | CodeCode Available | 0 |
| ACCORD: Closing the Commonsense Measurability Gap | Jun 4, 2024 | BenchmarkingCommon Sense Reasoning | CodeCode Available | 0 |
| Benchmarking 6DOF Outdoor Visual Localization in Changing Conditions | Jul 28, 2017 | Autonomous VehiclesBenchmarking | CodeCode Available | 0 |
| Benchmark Generation Framework with Customizable Distortions for Image Classifier Robustness | Oct 28, 2023 | Benchmarkingimage-classification | CodeCode Available | 0 |
| Benchmarking Instance-Centric Counterfactual Algorithms for XAI: From White Box to Black Box | Mar 4, 2022 | Benchmarkingcounterfactual | CodeCode Available | 0 |