| ChatGPT for GTFS: Benchmarking LLMs on GTFS Understanding and Retrieval | Aug 4, 2023 | BenchmarkingInformation Retrieval | CodeCode Available | 0 |
| RCP-Bench: Benchmarking Robustness for Collaborative Perception Under Diverse Corruptions | Jan 1, 2025 | Benchmarking | CodeCode Available | 0 |
| TuneVLSeg: Prompt Tuning Benchmark for Vision-Language Segmentation Models | Oct 7, 2024 | BenchmarkingSegmentation | CodeCode Available | 0 |
| RDF-star2Vec: RDF-star Graph Embeddings for Data Mining | Dec 25, 2023 | BenchmarkingGraph Embedding | CodeCode Available | 0 |
| 4D-Bench: Benchmarking Multi-modal Large Language Models for 4D Object Understanding | Mar 22, 2025 | BenchmarkingObject | CodeCode Available | 0 |
| An Empirical Evaluation of Cost-based Federated SPARQL Query Processing Engines | Apr 2, 2021 | Benchmarking | CodeCode Available | 0 |
| Characterizing SLAM Benchmarks and Methods for the Robust Perception Age | May 19, 2019 | Benchmarking | CodeCode Available | 0 |
| Benchmarking Adversarial Robustness to Bias Elicitation in Large Language Models: Scalable Automated Assessment with LLM-as-a-Judge | Apr 10, 2025 | Adversarial RobustnessBenchmarking | CodeCode Available | 0 |
| Characterizing Bias: Benchmarking Large Language Models in Simplified versus Traditional Chinese | May 28, 2025 | Benchmarking | CodeCode Available | 0 |
| Changepoint Detection in Noisy Data Using a Novel Residuals Permutation-Based Method (RESPERM): Benchmarking and Application to Single Trial ERPs | Apr 21, 2022 | BenchmarkingChange Point Detection | CodeCode Available | 0 |
| TuringQ: Benchmarking AI Comprehension in Theory of Computation | Oct 9, 2024 | Benchmarking | CodeCode Available | 0 |
| An empirical comparison between stochastic and deterministic centroid initialisation for K-Means variations | Aug 26, 2019 | BenchmarkingClustering | CodeCode Available | 0 |
| TweetNERD -- End to End Entity Linking Benchmark for Tweets | Oct 14, 2022 | BenchmarkingEntity Linking | CodeCode Available | 0 |
| Real-time cryo-EM data pre-processing with Warp | Jun 14, 2018 | BenchmarkingImage Reconstruction | CodeCode Available | 0 |
| Towards Learning Universal, Regional, and Local Hydrological Behaviors via Machine-Learning Applied to Large-Sample Datasets | Jul 19, 2019 | BenchmarkingBIG-bench Machine Learning | CodeCode Available | 0 |
| TextClass Benchmark: A Continuous Elo Rating of LLMs in Social Sciences | Nov 30, 2024 | BenchmarkingClassification | CodeCode Available | 0 |
| Certifiable Black-Box Attacks with Randomized Adversarial Examples: Breaking Defenses with Provable Confidence | Apr 10, 2023 | Benchmarkingspeech-recognition | CodeCode Available | 0 |
| Benchmarking Abstract and Reasoning Abilities Through A Theoretical Perspective | May 28, 2025 | BenchmarkingMemorization | CodeCode Available | 0 |
| An Efficient Two-stage Gradient Boosting Framework for Short-term Traffic State Estimation | Feb 21, 2023 | BenchmarkingState Estimation | CodeCode Available | 0 |
| ACCORD: Closing the Commonsense Measurability Gap | Jun 4, 2024 | BenchmarkingCommon Sense Reasoning | CodeCode Available | 0 |
| Benchmarking 6DOF Outdoor Visual Localization in Changing Conditions | Jul 28, 2017 | Autonomous VehiclesBenchmarking | CodeCode Available | 0 |
| Benchmark Generation Framework with Customizable Distortions for Image Classifier Robustness | Oct 28, 2023 | Benchmarkingimage-classification | CodeCode Available | 0 |
| Benchmarking Instance-Centric Counterfactual Algorithms for XAI: From White Box to Black Box | Mar 4, 2022 | Benchmarkingcounterfactual | CodeCode Available | 0 |