| Benchmarking Predictive Coding Networks -- Made Simple | Jul 1, 2024 | Benchmarking | CodeCode Available | 2 |
| AI Agents That Matter | Jul 1, 2024 | Benchmarking | CodeCode Available | 1 |
| Overcoming Common Flaws in the Evaluation of Selective Classification Systems | Jul 1, 2024 | BenchmarkingClassification | CodeCode Available | 1 |
| Commute Graph Neural Networks | Jun 30, 2024 | Benchmarking | —Unverified | 0 |
| GenderBias-VL: Benchmarking Gender Bias in Vision Language Models via Counterfactual Probing | Jun 30, 2024 | Benchmarkingcounterfactual | —Unverified | 0 |
| PerSEval: Assessing Personalization in Text Summarizers | Jun 29, 2024 | BenchmarkingHuman Judgment Correlation | —Unverified | 0 |
| GraphArena: Benchmarking Large Language Models on Graph Computational Problems | Jun 29, 2024 | BenchmarkingHallucination | CodeCode Available | 1 |
| iAMPCN: a deep-learning approach for identifying antimicrobial peptides and their functional activities | Jun 27, 2024 | Benchmarking | CodeCode Available | 1 |
| Generative AI for Synthetic Data Across Multiple Medical Modalities: A Systematic Review of Recent Developments and Challenges | Jun 27, 2024 | BenchmarkingClinical Knowledge | —Unverified | 0 |
| Benchmarking M6 Competitors: An Analysis of Financial Metrics and Discussion of Incentives | Jun 27, 2024 | Benchmarking | —Unverified | 0 |