| Hard-Label Cryptanalytic Extraction of Neural Network Models | Sep 18, 2024 | Benchmarking | CodeCode Available | 0 |
| Dynamic Neighborhood Construction for Structured Large Discrete Action Spaces | May 31, 2023 | BenchmarkingRecommendation Systems | CodeCode Available | 0 |
| Benchmarking Top-K Keyword and Top-K Document Processing with T^2K^2 and T^2K^2D^2 | Apr 20, 2018 | Benchmarking | CodeCode Available | 0 |
| HammerBench: Fine-Grained Function-Calling Evaluation in Real Mobile Device Scenarios | Dec 21, 2024 | Benchmarking | CodeCode Available | 0 |
| MEDAL: A Framework for Benchmarking LLMs as Multilingual Open-Domain Chatbots and Dialogue Evaluators | May 28, 2025 | BenchmarkingChatbot | CodeCode Available | 0 |
| MedArabiQ: Benchmarking Large Language Models on Arabic Medical Tasks | May 6, 2025 | BenchmarkingMultiple-choice | CodeCode Available | 0 |
| Benchmarking tools for a priori identifiability analysis | Jul 20, 2022 | Benchmarking | CodeCode Available | 0 |
| MedBookVQA: A Systematic and Comprehensive Medical Benchmark Derived from Open-Access Book | Jun 1, 2025 | Benchmarking | CodeCode Available | 0 |
| Benchmarking time series classification -- Functional data vs machine learning approaches | Nov 18, 2019 | Additive modelsBenchmarking | CodeCode Available | 0 |
| Benchmarking the Robustness of UAV Tracking Against Common Corruptions | Mar 18, 2024 | Benchmarking | CodeCode Available | 0 |