| Hard-Label Cryptanalytic Extraction of Neural Network Models | Sep 18, 2024 | Benchmarking | CodeCode Available | 0 |
| Dynamic Neighborhood Construction for Structured Large Discrete Action Spaces | May 31, 2023 | BenchmarkingRecommendation Systems | CodeCode Available | 0 |
| Benchmarking Top-K Keyword and Top-K Document Processing with T^2K^2 and T^2K^2D^2 | Apr 20, 2018 | Benchmarking | CodeCode Available | 0 |
| HammerBench: Fine-Grained Function-Calling Evaluation in Real Mobile Device Scenarios | Dec 21, 2024 | Benchmarking | CodeCode Available | 0 |
| MEDAL: A Framework for Benchmarking LLMs as Multilingual Open-Domain Chatbots and Dialogue Evaluators | May 28, 2025 | BenchmarkingChatbot | CodeCode Available | 0 |
| MedArabiQ: Benchmarking Large Language Models on Arabic Medical Tasks | May 6, 2025 | BenchmarkingMultiple-choice | CodeCode Available | 0 |
| Benchmarking tools for a priori identifiability analysis | Jul 20, 2022 | Benchmarking | CodeCode Available | 0 |
| MedBookVQA: A Systematic and Comprehensive Medical Benchmark Derived from Open-Access Book | Jun 1, 2025 | Benchmarking | CodeCode Available | 0 |
| Benchmarking time series classification -- Functional data vs machine learning approaches | Nov 18, 2019 | Additive modelsBenchmarking | CodeCode Available | 0 |
| Benchmarking the Robustness of UAV Tracking Against Common Corruptions | Mar 18, 2024 | Benchmarking | CodeCode Available | 0 |
| Roughness Index and Roughness Distance for Benchmarking Medical Segmentation | Mar 23, 2021 | BenchmarkingImage Segmentation | CodeCode Available | 0 |
| The KANDY Benchmark: Incremental Neuro-Symbolic Learning and Reasoning with Kandinsky Patterns | Feb 27, 2024 | BenchmarkingBinary Classification | CodeCode Available | 0 |
| MEDFAIR: Benchmarking Fairness for Medical Imaging | Oct 4, 2022 | BenchmarkingFairness | CodeCode Available | 0 |
| Benchmarking the Robustness of Optical Flow Estimation to Corruptions | Nov 22, 2024 | Autonomous DrivingBenchmarking | CodeCode Available | 0 |
| Adaptive Power System Emergency Control using Deep Reinforcement Learning | Mar 9, 2019 | BenchmarkingDeep Reinforcement Learning | CodeCode Available | 0 |
| Benchmarking the Linear Algebra Awareness of TensorFlow and PyTorch | Feb 20, 2022 | Benchmarking | CodeCode Available | 0 |
| gym-gazebo2, a toolkit for reinforcement learning using ROS 2 and Gazebo | Mar 14, 2019 | BenchmarkingOpenAI Gym | CodeCode Available | 0 |
| Benchmarking the Hooke-Jeeves Method, MTS-LS1, and BSrr on the Large-scale BBOB Function Set | Apr 28, 2022 | Benchmarking | CodeCode Available | 0 |
| Guidelines and Benchmarks for Deployment of Deep Learning Models on Smartphones as Real-Time Apps | Jan 8, 2019 | BenchmarkingCPU | CodeCode Available | 0 |
| Grounding Synthetic Data Evaluations of Language Models in Unsupervised Document Corpora | May 13, 2025 | BenchmarkingDiagnostic | CodeCode Available | 0 |
| The KiTS19 Challenge Data: 300 Kidney Tumor Cases with Clinical Context, CT Semantic Segmentations, and Surgical Outcomes | Mar 31, 2019 | BenchmarkingComputed Tomography (CT) | CodeCode Available | 0 |
| RTSeg: Real-time Semantic Segmentation Comparative Study | Mar 7, 2018 | Autonomous DrivingBenchmarking | CodeCode Available | 0 |
| Meet Spinky: An Open-Source Spindle and K-Complex Detection Toolbox Validated on the Open-Access Montreal Archive of Sleep Studies (MASS). | Mar 2, 2017 | BenchmarkingEEG | CodeCode Available | 0 |
| Benchmarking the Hill-Valley Evolutionary Algorithm for the GECCO 2018 Competition on Niching Methods Multimodal Optimization | Jun 30, 2018 | Benchmarking | CodeCode Available | 0 |
| Grounded Intuition of GPT-Vision's Abilities with Scientific Images | Nov 3, 2023 | Benchmarkingcounterfactual | CodeCode Available | 0 |
| GRATIS: GeneRAting TIme Series with diverse and controllable characteristics | Mar 7, 2019 | BenchmarkingClustering | CodeCode Available | 0 |
| Understanding the World's Museums through Vision-Language Reasoning | Dec 2, 2024 | BenchmarkingQuestion Answering | CodeCode Available | 0 |
| RUPBench: Benchmarking Reasoning Under Perturbations for Robustness Evaluation in Large Language Models | Jun 16, 2024 | Benchmarking | CodeCode Available | 0 |
| Grasp Pre-shape Selection by Synthetic Training: Eye-in-hand Shared Control on the Hannes Prosthesis | Mar 18, 2022 | BenchmarkingObject Recognition | CodeCode Available | 0 |
| Benchmarking the Fairness of Image Upsampling Methods | Jan 24, 2024 | BenchmarkingDiversity | CodeCode Available | 0 |
| Graph-theoretical approach to robust 3D normal extraction of LiDAR data | May 23, 2022 | Benchmarking | CodeCode Available | 0 |
| A Modular Workflow for Performance Benchmarking of Neuronal Network Simulations | Dec 16, 2021 | Benchmarking | CodeCode Available | 0 |
| Messing Up 3D Virtual Environments: Transferable Adversarial 3D Objects | Sep 17, 2021 | BenchmarkingBIG-bench Machine Learning | CodeCode Available | 0 |
| Graph Neural Networks Are More Than Filters: Revisiting and Benchmarking from A Spectral Perspective | Dec 10, 2024 | Benchmarking | CodeCode Available | 0 |
| Meta-Black-Box-Optimization through Offline Q-function Learning | May 4, 2025 | BenchmarkingMamba | CodeCode Available | 0 |
| Learning Conjoint Attentions for Graph Neural Nets | Feb 5, 2021 | BenchmarkingGraph Attention | CodeCode Available | 0 |
| Graph Convolutional Networks Meet with High Dimensionality Reduction | Nov 7, 2019 | BenchmarkingDimensionality Reduction | CodeCode Available | 0 |
| Benchmarking the Attribution Quality of Vision Models | Jul 16, 2024 | BenchmarkingExplainable Models | CodeCode Available | 0 |
| MetaFaith: Faithful Natural Language Uncertainty Expression in LLMs | May 30, 2025 | Benchmarking | CodeCode Available | 0 |
| GPT4Graph: Can Large Language Models Understand Graph Structured Data ? An Empirical Evaluation and Benchmarking | May 24, 2023 | BenchmarkingGraph Mining | CodeCode Available | 0 |
| MetaGreen: Meta-Learning Inspired Transformer Selection for Green Semantic Communication | Jun 22, 2024 | BenchmarkingMeta-Learning | CodeCode Available | 0 |
| S3Simulator: A benchmarking Side Scan Sonar Simulator dataset for Underwater Image Analysis | Aug 23, 2024 | Benchmarking | CodeCode Available | 0 |
| Good at captioning, bad at counting: Benchmarking GPT-4V on Earth observation data | Jan 31, 2024 | BenchmarkingChange Detection | CodeCode Available | 0 |
| GOAL: Towards Benchmarking Few-Shot Sports Game Summarization | Jul 18, 2022 | Benchmarking | CodeCode Available | 0 |
| SACoD: Sensor Algorithm Co-Design Towards Efficient CNN-powered Intelligent PhlatCam | Jan 1, 2021 | BenchmarkingModel Compression | CodeCode Available | 0 |
| GNNMerge: Merging of GNN Models Without Accessing Training Data | Mar 5, 2025 | BenchmarkingComputational Efficiency | CodeCode Available | 0 |
| Meta-survey on outlier and anomaly detection | Dec 12, 2023 | Anomaly DetectionBenchmarking | CodeCode Available | 0 |
| The Legal Argument Reasoning Task in Civil Procedure | Nov 5, 2022 | Benchmarking | CodeCode Available | 0 |
| A Modular Benchmarking Infrastructure for High-Performance and Reproducible Deep Learning | Jan 29, 2019 | BenchmarkingDeep Learning | CodeCode Available | 0 |
| Safe Multi-Agent Navigation guided by Goal-Conditioned Safe Reinforcement Learning | Feb 25, 2025 | BenchmarkingReinforcement Learning (RL) | CodeCode Available | 0 |