| Benchmarking Large Vision-Language Models via Directed Scene Graph for Comprehensive Image Captioning | Dec 11, 2024 | AttributeBenchmarking | CodeCode Available | 1 | 5 |
| Descending through a Crowded Valley - Benchmarking Deep Learning Optimizers | Jul 3, 2020 | BenchmarkingDeep Learning | CodeCode Available | 1 | 5 |
| Developing a Scalable Benchmark for Assessing Large Language Models in Knowledge Graph Engineering | Aug 31, 2023 | BenchmarkingDataset Generation | CodeCode Available | 1 | 5 |
| DependEval: Benchmarking LLMs for Repository Dependency Understanding | Mar 9, 2025 | BenchmarkingCode Generation | CodeCode Available | 1 | 5 |
| Benchmarking Large Language Models on CMExam -- A Comprehensive Chinese Medical Exam Dataset | Jun 5, 2023 | BenchmarkingMultiple-choice | CodeCode Available | 1 | 5 |
| Benchmarking Large Language Models on Answering and Explaining Challenging Medical Questions | Feb 28, 2024 | BenchmarkingMultiple-choice | CodeCode Available | 1 | 5 |
| Demystifying Learning Rate Policies for High Accuracy Training of Deep Neural Networks | Aug 18, 2019 | BenchmarkingImage Classification | CodeCode Available | 1 | 5 |
| Depth-Driven Geometric Prompt Learning for Laparoscopic Liver Landmark Detection | Jun 25, 2024 | BenchmarkingPrompt Learning | CodeCode Available | 1 | 5 |
| Benchmarking Large Language Models on Controllable Generation under Diversified Instructions | Jan 1, 2024 | BenchmarkingInstruction Following | CodeCode Available | 1 | 5 |
| Active-Passive SimStereo -- Benchmarking the Cross-Generalization Capabilities of Deep Learning-based Stereo Methods | Sep 17, 2022 | BenchmarkingStereo Matching | CodeCode Available | 1 | 5 |
| Delving into Out-of-Distribution Detection with Medical Vision-Language Models | Mar 2, 2025 | Benchmarkingimage-classification | CodeCode Available | 1 | 5 |
| DeID-GPT: Zero-shot Medical Text De-Identification by GPT-4 | Mar 20, 2023 | BenchmarkingDe-identification | CodeCode Available | 1 | 5 |
| MatTools: Benchmarking Large Language Models for Materials Science Tools | May 16, 2025 | BenchmarkingQuestion Answering | CodeCode Available | 1 | 5 |
| Deluca -- A Differentiable Control Library: Environments, Methods, and Benchmarking | Feb 19, 2021 | BenchmarkingOpenAI Gym | CodeCode Available | 1 | 5 |
| RobFR: Benchmarking Adversarial Robustness on Face Recognition | Jul 8, 2020 | Adversarial RobustnessBenchmarking | CodeCode Available | 1 | 5 |
| DexArt: Benchmarking Generalizable Dexterous Manipulation with Articulated Objects | May 9, 2023 | BenchmarkingDecision Making | CodeCode Available | 1 | 5 |
| Benchmarking Large Language Models for Persian: A Preliminary Study Focusing on ChatGPT | Apr 3, 2024 | BenchmarkingGeneral Knowledge | CodeCode Available | 1 | 5 |
| Amharic LLaMA and LLaVA: Multimodal LLMs for Low Resource Languages | Mar 11, 2024 | BenchmarkingData Augmentation | CodeCode Available | 1 | 5 |
| Deep Learning for ECG Analysis: Benchmarks and Insights from PTB-XL | Apr 28, 2020 | AllBenchmarking | CodeCode Available | 1 | 5 |
| Benchmarking Large Language Models for Automated Verilog RTL Code Generation | Dec 13, 2022 | BenchmarkingCode Generation | CodeCode Available | 1 | 5 |
| Deep Learning-Based Synchronization for Uplink NB-IoT | May 22, 2022 | BenchmarkingDeep Learning | CodeCode Available | 1 | 5 |
| Deep learning model solves change point detection for multiple change types | Apr 15, 2022 | BenchmarkingChange Point Detection | CodeCode Available | 1 | 5 |
| ALTO: A Large-Scale Dataset for UAV Visual Place Recognition and Localization | Jul 19, 2022 | BenchmarkingImage Registration | CodeCode Available | 1 | 5 |
| Benchmarking Large Language Models for News Summarization | Jan 31, 2023 | BenchmarkingNews Summarization | CodeCode Available | 1 | 5 |
| Benchmarking Language Models for Code Syntax Understanding | Oct 26, 2022 | Benchmarking | CodeCode Available | 1 | 5 |
| A Critical Assessment of State-of-the-Art in Entity Alignment | Oct 30, 2020 | BenchmarkingEntity Alignment | CodeCode Available | 1 | 5 |
| dEchorate: a Calibrated Room Impulse Response Database for Echo-aware Signal Processing | Apr 27, 2021 | BenchmarkingRetrieval | CodeCode Available | 1 | 5 |
| DCL-Net: Deep Correspondence Learning Network for 6D Pose Estimation | Oct 11, 2022 | 6D Pose Estimation6D Pose Estimation using RGB | CodeCode Available | 1 | 5 |
| Benchmarking Knowledge Boundary for Large Language Models: A Different Perspective on Model Evaluation | Feb 18, 2024 | BenchmarkingLanguage Modeling | CodeCode Available | 1 | 5 |
| Decentralized Arena: Towards Democratic and Scalable Automatic Evaluation of Language Models | May 19, 2025 | BenchmarkingChatbot | CodeCode Available | 1 | 5 |
| Decoding the Enigma: Benchmarking Humans and AIs on the Many Facets of Working Memory | Jul 20, 2023 | BenchmarkingDecision Making | CodeCode Available | 1 | 5 |
| Benchmarking Image Retrieval for Visual Localization | Nov 24, 2020 | Autonomous DrivingBenchmarking | CodeCode Available | 1 | 5 |
| Benchmarking Implicit Neural Representation and Geometric Rendering in Real-Time RGB-D SLAM | Mar 28, 2024 | Benchmarking | CodeCode Available | 1 | 5 |
| Benchmarking LLMs' Swarm intelligence | May 7, 2025 | Benchmarking | CodeCode Available | 1 | 5 |
| Benchmarking human visual search computational models in natural scenes: models comparison and reference datasets | Dec 10, 2021 | Benchmarking | CodeCode Available | 1 | 5 |
| Benchmarking Language Model Creativity: A Case Study on Code Generation | Jul 12, 2024 | BenchmarkingCode Generation | CodeCode Available | 1 | 5 |
| Data Splits and Metrics for Method Benchmarking on Surgical Action Triplet Datasets | Apr 11, 2022 | Action Triplet RecognitionBenchmarking | CodeCode Available | 1 | 5 |
| Decoding the Underlying Meaning of Multimodal Hateful Memes | May 28, 2023 | BenchmarkingHateful Meme Classification | CodeCode Available | 1 | 5 |
| DFGC 2021: A DeepFake Game Competition | Jun 2, 2021 | BenchmarkingDeepFake Detection | CodeCode Available | 1 | 5 |
| AllClear: A Comprehensive Dataset and Benchmark for Cloud Removal in Satellite Imagery | Oct 31, 2024 | BenchmarkingCloud Removal | CodeCode Available | 1 | 5 |
| Data-Driven Denoising of Stationary Accelerometer Signals | Jun 13, 2022 | BenchmarkingDenoising | CodeCode Available | 1 | 5 |
| D2S: Document-to-Slide Generation Via Query-Based Text Summarization | May 8, 2021 | BenchmarkingLong Form Question Answering | CodeCode Available | 1 | 5 |
| DACBench: A Benchmark Library for Dynamic Algorithm Configuration | May 18, 2021 | Benchmarking | CodeCode Available | 1 | 5 |
| Data Generating Process to Evaluate Causal Discovery Techniques for Time Series Data | Apr 16, 2021 | BenchmarkingCausal Discovery | CodeCode Available | 1 | 5 |
| Align and Distill: Unifying and Improving Domain Adaptive Object Detection | Mar 18, 2024 | Benchmarkingobject-detection | CodeCode Available | 1 | 5 |
| Benchmarking Graph Neural Networks on Dynamic Link Prediction | Sep 29, 2021 | BenchmarkingDynamic Link Prediction | CodeCode Available | 1 | 5 |
| Curious Hierarchical Actor-Critic Reinforcement Learning | May 7, 2020 | BenchmarkingHierarchical Reinforcement Learning | CodeCode Available | 1 | 5 |
| CySecBench: Generative AI-based CyberSecurity-focused Prompt Dataset for Benchmarking Large Language Models | Jan 2, 2025 | BenchmarkingComputer Security | CodeCode Available | 1 | 5 |
| DataRec: A Python Library for Standardized and Reproducible Data Management in Recommender Systems | Oct 30, 2024 | BenchmarkingManagement | CodeCode Available | 1 | 5 |
| CRoW: Benchmarking Commonsense Reasoning in Real-World Tasks | Oct 23, 2023 | Benchmarking | CodeCode Available | 1 | 5 |