| Strong and Simple Baselines for Multimodal Utterance Embeddings | May 14, 2019 | Benchmarking | CodeCode Available | 0 | 5 |
| Are Large Language Models True Healthcare Jacks-of-All-Trades? Benchmarking Across Health Professions Beyond Physician Exams | Jun 17, 2024 | AllBenchmarking | CodeCode Available | 0 | 5 |
| DLAMA: A Framework for Curating Culturally Diverse Facts for Probing the Knowledge of Pretrained Language Models | Jun 8, 2023 | BenchmarkingFairness | CodeCode Available | 0 | 5 |
| Benchmarking Large Language Models for Math Reasoning Tasks | Aug 20, 2024 | BenchmarkingIn-Context Learning | CodeCode Available | 0 | 5 |
| Benchmarking Large Language Models for Image Classification of Marine Mammals | Oct 22, 2024 | Benchmarkingimage-classification | CodeCode Available | 0 | 5 |
| Flexible Generation of Preference Data for Recommendation Analysis | Jul 23, 2024 | BenchmarkingRecommendation Systems | CodeCode Available | 0 | 5 |
| Divergent Creativity in Humans and Large Language Models | May 13, 2024 | Benchmarking | CodeCode Available | 0 | 5 |
| Local manifold learning and its link to domain-based physics knowledge | Jul 1, 2022 | BenchmarkingDimensionality Reduction | CodeCode Available | 0 | 5 |
| Distributional Depth-Based Estimation of Object Articulation Models | Aug 12, 2021 | BenchmarkingObject | CodeCode Available | 0 | 5 |
| Distributing Deep Learning Hyperparameter Tuning for 3D Medical Image Segmentation | Oct 29, 2021 | BenchmarkingBrain Tumor Segmentation | CodeCode Available | 0 | 5 |
| A Framework for Generating Informative Benchmark Instances | May 29, 2022 | Benchmarking | CodeCode Available | 0 | 5 |
| GiantHunter: Accurate detection of giant virus in metagenomic data using reinforcement-learning and Monte Carlo tree search | Jan 26, 2025 | BenchmarkingDiversity | CodeCode Available | 0 | 5 |
| A Classification Benchmark for Artificial Intelligence Detection of Laryngeal Cancer from Patient Voice | Dec 20, 2024 | BenchmarkingDiagnostic | CodeCode Available | 0 | 5 |
| Distributed Non-Convex Optimization with Sublinear Speedup under Intermittent Client Availability | Feb 18, 2020 | BenchmarkingFederated Learning | CodeCode Available | 0 | 5 |
| Generalization and Regularization in DQN | Sep 29, 2018 | Atari GamesBenchmarking | CodeCode Available | 0 | 5 |
| Dissecting Sample Hardness: A Fine-Grained Analysis of Hardness Characterization Methods for Data-Centric AI | Mar 7, 2024 | Benchmarking | CodeCode Available | 0 | 5 |
| exHarmony: Authorship and Citations for Benchmarking the Reviewer Assignment Problem | Feb 11, 2025 | BenchmarkingDiversity | CodeCode Available | 0 | 5 |
| Dissecting Dissonance: Benchmarking Large Multimodal Models Against Self-Contradictory Instructions | Aug 2, 2024 | Benchmarkingmultimodal interaction | CodeCode Available | 0 | 5 |
| Expecting The Unexpected: Towards Broad Out-Of-Distribution Detection | Aug 22, 2023 | BenchmarkingOut-of-Distribution Detection | CodeCode Available | 0 | 5 |
| Experimental Analysis of Large-scale Learnable Vector Storage Compression | Nov 27, 2023 | Benchmarking | CodeCode Available | 0 | 5 |
| Benchmarking Large Language Models for Molecule Prediction Tasks | Mar 8, 2024 | BenchmarkingPrediction | CodeCode Available | 0 | 5 |
| DispBench: Benchmarking Disparity Estimation to Synthetic Corruptions | May 8, 2025 | Autonomous NavigationBenchmarking | CodeCode Available | 0 | 5 |
| Are Large Language Models Good at Utility Judgments? | Mar 28, 2024 | Answer GenerationBenchmarking | CodeCode Available | 0 | 5 |
| DispaRisk: Auditing Fairness Through Usable Information | May 20, 2024 | BenchmarkingBias Detection | CodeCode Available | 0 | 5 |
| GenderBench: Evaluation Suite for Gender Biases in LLMs | May 17, 2025 | Benchmarking | CodeCode Available | 0 | 5 |