| DHP Benchmark: Are LLMs Good NLG Evaluators? | Aug 25, 2024 | Benchmarkingnlg evaluation | —Unverified | 0 |
| Diagnosing and Addressing Pitfalls in KG-RAG Datasets: Toward More Reliable Benchmarking | May 29, 2025 | BenchmarkingGraph Question Answering | —Unverified | 0 |
| Dialogue Games for Benchmarking Language Understanding: Motivation, Taxonomy, Strategy | Apr 14, 2023 | Benchmarking | —Unverified | 0 |
| DI-BENCH: Benchmarking Large Language Models on Dependency Inference with Testable Repositories at Scale | Jan 23, 2025 | Benchmarking | —Unverified | 0 |
| DIF: A Framework for Benchmarking and Verifying Implicit Bias in LLMs | May 15, 2025 | BenchmarkingFairness | —Unverified | 0 |
| Diff5T: Benchmarking Human Brain Diffusion MRI with an Extensive 5.0 Tesla K-Space and Spatial Dataset | Dec 9, 2024 | BenchmarkingDiffusion MRI | —Unverified | 0 |
| DiffBody: Human Body Restoration by Imagining with Generative Diffusion Prior | Apr 4, 2024 | BenchmarkingImage Restoration | —Unverified | 0 |
| Different Horses for Different Courses: Comparing Bias Mitigation Algorithms in ML | Nov 17, 2024 | BenchmarkingFairness | —Unverified | 0 |
| Differential Privacy for Adaptive Weight Aggregation in Federated Tumor Segmentation | Aug 1, 2023 | BenchmarkingBrain Tumor Segmentation | —Unverified | 0 |
| Diffusion-Driven Domain Adaptation for Generating 3D Molecules | Apr 1, 2024 | BenchmarkingDecoder | —Unverified | 0 |