| Benchmarking AutoML algorithms on a collection of synthetic classification problems | Dec 6, 2022 | AutoMLBenchmarking | CodeCode Available | 0 | 5 |
| JALMBench: Benchmarking Jailbreak Vulnerabilities in Audio Language Models | May 23, 2025 | BenchmarkingDiversity | CodeCode Available | 0 | 5 |
| Can Tree Based Approaches Surpass Deep Learning in Anomaly Detection? A Benchmarking Study | Feb 11, 2024 | Anomaly DetectionBenchmarking | CodeCode Available | 0 | 5 |
| DyKnow: Dynamically Verifying Time-Sensitive Factual Knowledge in LLMs | Apr 10, 2024 | Benchmarkingknowledge editing | CodeCode Available | 0 | 5 |
| JATE 2.0: Java Automatic Term Extraction with Apache Solr | May 1, 2016 | BenchmarkingTerm Extraction | CodeCode Available | 0 | 5 |
| Knowledge Enhanced Conditional Imputation for Healthcare Time-series | Dec 27, 2023 | BenchmarkingImputation | CodeCode Available | 0 | 5 |
| IoT Data Trust Evaluation via Machine Learning | Aug 15, 2023 | BenchmarkingTime Series | CodeCode Available | 0 | 5 |
| IPC: A Benchmark Data Set for Learning with Graph-Structured Data | May 15, 2019 | BenchmarkingGraph Classification | CodeCode Available | 0 | 5 |
| Can LLMs Grasp Implicit Cultural Values? Benchmarking LLMs' Metacognitive Cultural Intelligence with CQ-Bench | Apr 1, 2025 | Benchmarking | CodeCode Available | 0 | 5 |
| InViG: Benchmarking Interactive Visual Grounding with 500K Human-Robot Interactions | Oct 18, 2023 | BenchmarkingVisual Grounding | CodeCode Available | 0 | 5 |
| IOLBENCH: Benchmarking LLMs on Linguistic Reasoning | Jan 8, 2025 | Benchmarking | CodeCode Available | 0 | 5 |
| ISImed: A Framework for Self-Supervised Learning using Intrinsic Spatial Information in Medical Images | Oct 22, 2024 | BenchmarkingSelf-Supervised Learning | CodeCode Available | 0 | 5 |
| Inverse Contextual Bandits: Learning How Behavior Evolves over Time | Jul 13, 2021 | BenchmarkingDecision Making | CodeCode Available | 0 | 5 |
| Investigating the Impact of Hard Samples on Accuracy Reveals In-class Data Imbalance | Sep 22, 2024 | AutoMLBenchmarking | CodeCode Available | 0 | 5 |
| Can geometric combinatorics improve RNA branching predictions? | Mar 26, 2025 | Benchmarking | CodeCode Available | 0 | 5 |
| Introducing SLAMBench, a performance and accuracy benchmarking methodology for SLAM | Oct 8, 2014 | Benchmarking | CodeCode Available | 0 | 5 |
| Air Learning: A Deep Reinforcement Learning Gym for Autonomous Aerial Robot Visual Navigation | Jun 2, 2019 | BenchmarkingDeep Reinforcement Learning | CodeCode Available | 0 | 5 |
| Can a single neuron learn predictive uncertainty? | Jun 7, 2021 | BenchmarkingConformal Prediction | CodeCode Available | 0 | 5 |
| Can AI Validate Science? Benchmarking LLMs for Accurate Scientific Claim Evidence Reasoning | Jun 9, 2025 | BenchmarkingDiagnostic | CodeCode Available | 0 | 5 |
| Integration of nested cross-validation, automated hyperparameter optimization, high-performance computing to reduce and quantify the variance of test performance estimation of deep learning models | Mar 11, 2025 | BenchmarkingHyperparameter Optimization | CodeCode Available | 0 | 5 |
| Integrating Expert Knowledge into Logical Programs via LLMs | Feb 17, 2025 | BenchmarkingLogical Reasoning | CodeCode Available | 0 | 5 |
| JavaBench: A Benchmark of Object-Oriented Code Generation for Evaluating Large Language Models | Jun 10, 2024 | BenchmarkingCode Generation | CodeCode Available | 0 | 5 |
| Analyzing the Feature Extractor Networks for Face Image Synthesis | Jun 4, 2024 | BenchmarkingImage Generation | CodeCode Available | 0 | 5 |
| InstaIndoor and Multi-modal Deep Learning for Indoor Scene Recognition | Dec 23, 2021 | BenchmarkingDeep Learning | CodeCode Available | 0 | 5 |
| Benchmarking Multi-dimensional AIGC Video Quality Assessment: A Dataset and Unified Model | Jul 31, 2024 | BenchmarkingLarge Language Model | CodeCode Available | 0 | 5 |