| Benchmarking projective simulation in navigation problems | Apr 23, 2018 | BenchmarkingQ-Learning | —Unverified | 0 | 0 |
| Benchmarking Processor Performance by Multi-Threaded Machine Learning Algorithms | Sep 11, 2021 | BenchmarkingBIG-bench Machine Learning | —Unverified | 0 | 0 |
| JuStRank: Benchmarking LLM Judges for System Ranking | Dec 12, 2024 | Benchmarking | —Unverified | 0 | 0 |
| Benchmarking Pretrained Vision Embeddings for Near- and Duplicate Detection in Medical Images | Dec 12, 2023 | BenchmarkingRetrieval | —Unverified | 0 | 0 |
| Aerial Scene Parsing: From Tile-level Scene Classification to Pixel-wise Semantic Labeling | Jan 6, 2022 | Aerial Scene ClassificationBenchmarking | —Unverified | 0 | 0 |
| AERF: Adaptive ensemble random fuzzy algorithm for anomaly detection in cloud computing | Jan 9, 2023 | Anomaly DetectionBenchmarking | —Unverified | 0 | 0 |
| THOUGHTTERMINATOR: Benchmarking, Calibrating, and Mitigating Overthinking in Reasoning Models | Apr 17, 2025 | BenchmarkingMath | —Unverified | 0 | 0 |
| Benchmarking the Performance of Pre-trained LLMs across Urdu NLP Tasks | May 24, 2024 | BenchmarkingDecoder | —Unverified | 0 | 0 |
| KemenkeuGPT: Leveraging a Large Language Model on Indonesia's Government Financial Data and Regulations to Enhance Decision Making | Jul 31, 2024 | BenchmarkingDecision Making | —Unverified | 0 | 0 |
| Keras Sig: Efficient Path Signature Computation on GPU in Keras 3 | Jan 14, 2025 | BenchmarkingC++ code | —Unverified | 0 | 0 |
| KetGPT -- Dataset Augmentation of Quantum Circuits using Transformers | Feb 20, 2024 | Benchmarking | —Unverified | 0 | 0 |
| Benchmarking Pretrained Attention-based Models for Real-Time Recognition in Robot-Assisted Esophagectomy | Dec 4, 2024 | AnatomyBenchmarking | —Unverified | 0 | 0 |
| Classification of Single-View Object Point Clouds | Dec 18, 2020 | 3D Object Classification6D Pose Estimation using RGB | —Unverified | 0 | 0 |
| Benchmarking Practices in LLM-driven Offensive Security: Testbeds, Metrics, and Experiment Design | Apr 14, 2025 | BenchmarkingLanguage Modeling | —Unverified | 0 | 0 |
| Benchmarking Post-Hoc Unknown-Category Detection in Food Recognition | Mar 24, 2025 | BenchmarkingFood Recognition | —Unverified | 0 | 0 |
| Benchmarking Poisoning Attacks against Retrieval-Augmented Generation | May 24, 2025 | BenchmarkingQuestion Answering | —Unverified | 0 | 0 |
| Benchmarking person re-identification approaches and training datasets for practical real-world implementations | Sep 29, 2021 | BenchmarkingPedestrian Detection | —Unverified | 0 | 0 |
| Deep Reinforcement Learning for Dynamic Order Picking in Warehouse Operations | Aug 3, 2024 | BenchmarkingDeep Reinforcement Learning | —Unverified | 0 | 0 |
| Knowledge-aware contrastive heterogeneous molecular graph learning | Feb 17, 2025 | BenchmarkingContrastive Learning | —Unverified | 0 | 0 |
| AEON: Adaptive Estimation of Instance-Dependent In-Distribution and Out-of-Distribution Label Noise for Robust Learning | Jan 23, 2025 | Benchmarkingimage-classification | —Unverified | 0 | 0 |
| TIIF-Bench: How Does Your T2I Model Follow Your Instructions? | Jun 2, 2025 | BenchmarkingInstruction Following | —Unverified | 0 | 0 |
| Knowledge Sharing in Manufacturing using Large Language Models: User Evaluation and Model Benchmarking | Jan 10, 2024 | BenchmarkingInformation Retrieval | —Unverified | 0 | 0 |
| 3D Compositional Zero-shot Learning with DeCompositional Consensus | Nov 29, 2021 | BenchmarkingCompositional Zero-Shot Learning | —Unverified | 0 | 0 |
| Benchmarking Performance of Deep Learning Model for Material Segmentation on Two HPC Systems | Jul 27, 2023 | BenchmarkingGPU | —Unverified | 0 | 0 |
| Know Thy Judge: On the Robustness Meta-Evaluation of LLM Safety Judges | Mar 6, 2025 | BenchmarkingLanguage Modeling | —Unverified | 0 | 0 |
| Benchmarking Pedestrian Odometry: The Brown Pedestrian Odometry Dataset (BPOD) | Dec 24, 2021 | BenchmarkingPosition | —Unverified | 0 | 0 |
| Benchmarking PathCLIP for Pathology Image Analysis | Jan 5, 2024 | BenchmarkingDecision Making | —Unverified | 0 | 0 |
| Kolmogorov-Arnold Network for Transistor Compact Modeling | Mar 19, 2025 | Benchmarking | —Unverified | 0 | 0 |
| Koopman Theory-Inspired Method for Learning Time Advancement Operators in Unstable Flame Front Evolution | Dec 11, 2024 | Benchmarking | —Unverified | 0 | 0 |
| Benchmarking Out-of-Distribution Generalization Capabilities of DNN-based Encoding Models for the Ventral Visual Cortex | Jun 16, 2024 | BenchmarkingObject Recognition | —Unverified | 0 | 0 |
| KRIS-Bench: Benchmarking Next-Level Intelligent Image Editing Models | May 22, 2025 | BenchmarkingDiagnostic | —Unverified | 0 | 0 |
| KRISTEVA: Close Reading as a Novel Task for Benchmarking Interpretive Reasoning | May 14, 2025 | BenchmarkingMMLU | —Unverified | 0 | 0 |
| K-Sort Arena: Efficient and Reliable Benchmarking for Generative Models via K-wise Human Preferences | Aug 26, 2024 | Benchmarking | —Unverified | 0 | 0 |
| Benchmarking Ophthalmology Foundation Models for Clinically Significant Age Macular Degeneration Detection | May 8, 2025 | BenchmarkingOut-of-Distribution Generalization | —Unverified | 0 | 0 |
| Benchmarking Open-Source Large Language Models on Healthcare Text Classification Tasks | Mar 19, 2025 | BenchmarkingDomain Adaptation | —Unverified | 0 | 0 |
| L3Cube-MahaSBERT and HindSBERT: Sentence BERT Models and Benchmarking BERT Sentence Representations for Hindi and Marathi | Nov 21, 2022 | BenchmarkingMachine Translation | —Unverified | 0 | 0 |
| L3 Fusion: Fast Transformed Convolutions on CPUs | Dec 4, 2019 | Benchmarking | —Unverified | 0 | 0 |
| Advocating Character Error Rate for Multilingual ASR Evaluation | Oct 9, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 | 0 |
| Label Anchored Contrastive Learning for Language Understanding | Apr 26, 2022 | BenchmarkingContrastive Learning | —Unverified | 0 | 0 |
| Comparison of Open-Source and Proprietary LLMs for Machine Reading Comprehension: A Practical Analysis for Industrial Applications | Jun 19, 2024 | BenchmarkingMachine Reading Comprehension | —Unverified | 0 | 0 |
| Label-Efficient Point Cloud Semantic Segmentation: An Active Learning Approach | Jan 18, 2021 | Active LearningBenchmarking | —Unverified | 0 | 0 |
| Benchmarking Open-ended Audio Dialogue Understanding for Large Audio-Language Models | Dec 6, 2024 | BenchmarkingDialogue Understanding | —Unverified | 0 | 0 |
| AI Cyber Risk Benchmark: Automated Exploitation Capabilities | Oct 29, 2024 | BenchmarkingVulnerability Detection | —Unverified | 0 | 0 |
| λ: A Benchmark for Data-Efficiency in Long-Horizon Indoor Mobile Manipulation Robotics | Nov 28, 2024 | BenchmarkingDiversity | —Unverified | 0 | 0 |
| LabSafety Bench: Benchmarking LLMs on Safety Issues in Scientific Labs | Oct 18, 2024 | BenchmarkingFairness | —Unverified | 0 | 0 |
| Time and Tokens: Benchmarking End-to-End Speech Dysfluency Detection | Sep 20, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 | 0 |
| LAG-MMLU: Benchmarking Frontier LLM Understanding in Latvian and Giriama | Mar 14, 2025 | BenchmarkingMMLU | —Unverified | 0 | 0 |
| Benchmarking Online Sequence-to-Sequence and Character-based Handwriting Recognition from IMU-Enhanced Pens | Feb 14, 2022 | BenchmarkingHandwriting Recognition | —Unverified | 0 | 0 |
| Time Awareness in Large Language Models: Benchmarking Fact Recall Across Time | Sep 20, 2024 | BenchmarkingWorld Knowledge | —Unverified | 0 | 0 |
| Benchmarking Online Object Trackers for Underwater Robot Position Locking Applications | Feb 23, 2025 | BenchmarkingObject Tracking | —Unverified | 0 | 0 |