| Delving into Out-of-Distribution Detection with Medical Vision-Language Models | Mar 2, 2025 | Benchmarkingimage-classification | CodeCode Available | 1 | 5 |
| Demystifying Learning Rate Policies for High Accuracy Training of Deep Neural Networks | Aug 18, 2019 | BenchmarkingImage Classification | CodeCode Available | 1 | 5 |
| DependEval: Benchmarking LLMs for Repository Dependency Understanding | Mar 9, 2025 | BenchmarkingCode Generation | CodeCode Available | 1 | 5 |
| Benchmarking Offline Reinforcement Learning on Real-Robot Hardware | Jul 28, 2023 | Benchmarkingreinforcement-learning | CodeCode Available | 1 | 5 |
| Benchmarking Object Detectors under Real-World Distribution Shifts in Satellite Imagery | Mar 24, 2025 | BenchmarkingHumanitarian | CodeCode Available | 1 | 5 |
| Descending through a Crowded Valley — Benchmarking Deep Learning Optimizers | Jan 1, 2021 | BenchmarkingDeep Learning | CodeCode Available | 1 | 5 |
| Multimodal Fusion via Teacher-Student Network for Indoor Action Recognition | May 18, 2021 | Action RecognitionAction Recognition In Videos | CodeCode Available | 1 | 5 |
| Experimental Validation of Ultrasound Beamforming with End-to-End Deep Learning for Single Plane Wave Imaging | Apr 22, 2024 | Benchmarking | CodeCode Available | 1 | 5 |
| Detecting beats in the photoplethysmogram: benchmarking open-source algorithms | Jul 19, 2022 | BenchmarkingPhotoplethysmography (PPG) beat detection | CodeCode Available | 1 | 5 |
| MultiRes-NetVLAD: Augmenting Place Recognition Training with Low-Resolution Imagery | Feb 18, 2022 | BenchmarkingRepresentation Learning | CodeCode Available | 1 | 5 |
| Multi-Stream Cellular Test-Time Adaptation of Real-Time Models Evolving in Dynamic Environments | Apr 27, 2024 | Autonomous VehiclesBenchmarking | CodeCode Available | 1 | 5 |
| API-BLEND: A Comprehensive Corpora for Training and Benchmarking API LLMs | Feb 23, 2024 | Benchmarkingslot-filling | CodeCode Available | 1 | 5 |
| Developing a Scalable Benchmark for Assessing Large Language Models in Knowledge Graph Engineering | Aug 31, 2023 | BenchmarkingDataset Generation | CodeCode Available | 1 | 5 |
| Adversarial Prompt Evaluation: Systematic Benchmarking of Guardrails Against Prompt Input Attacks on LLMs | Feb 21, 2025 | Benchmarking | CodeCode Available | 1 | 5 |
| Benchmarking Fish Dataset and Evaluation Metric in Keypoint Detection -- Towards Precise Fish Morphological Assessment in Aquaculture Breeding | May 21, 2024 | BenchmarkingKeypoint Detection | CodeCode Available | 1 | 5 |
| Explainable Benchmarking for Iterative Optimization Heuristics | Jan 31, 2024 | BenchmarkingEvolutionary Algorithms | CodeCode Available | 1 | 5 |
| DialogueLLM: Context and Emotion Knowledge-Tuned Large Language Models for Emotion Recognition in Conversations | Oct 17, 2023 | BenchmarkingEmotion Recognition | CodeCode Available | 1 | 5 |
| NAS-Bench-360: Benchmarking Neural Architecture Search on Diverse Tasks | Oct 12, 2021 | Benchmarkingimage-classification | CodeCode Available | 1 | 5 |
| NAS-Bench-Graph: Benchmarking Graph Neural Architecture Search | Jun 18, 2022 | BenchmarkingGraph Neural Network | CodeCode Available | 1 | 5 |
| Benchmarking Neural Network Robustness to Common Corruptions and Surface Variations | Jul 4, 2018 | Adversarial DefenseBenchmarking | CodeCode Available | 1 | 5 |
| Benchmarking for Biomedical Natural Language Processing Tasks with a Domain Specific ALBERT | Jul 9, 2021 | BenchmarkingDocument Classification | CodeCode Available | 1 | 5 |
| DIG In: Evaluating Disparities in Image Generations with Indicators for Geographic Diversity | Aug 11, 2023 | BenchmarkingDiversity | CodeCode Available | 1 | 5 |
| DiffuSETS: 12-lead ECG Generation Conditioned on Clinical Text Reports and Patient-Specific Information | Jan 10, 2025 | BenchmarkingData Augmentation | CodeCode Available | 1 | 5 |
| Protein Structure Tokenization: Benchmarking and New Recipe | Feb 28, 2025 | BenchmarkingLanguage Modeling | CodeCode Available | 1 | 5 |
| Benchmarking Neural Network Generalization for Grammar Induction | Aug 16, 2023 | Benchmarking | CodeCode Available | 1 | 5 |