| BOOM: Benchmarking Out-Of-distribution Molecular Property Predictions of Machine Learning Models | May 3, 2025 | BenchmarkingHyperparameter Optimization | —Unverified | 0 |
| Interpretable graph-based models on multimodal biomedical data integration: A technical review and benchmarking | May 3, 2025 | BenchmarkingData Integration | —Unverified | 0 |
| Overview and practical recommendations on using Shapley Values for identifying predictive biomarkers via CATE modeling | May 2, 2025 | Benchmarking | —Unverified | 0 |
| EvalxNLP: A Framework for Benchmarking Post-Hoc Explainability Methods on NLP Models | May 2, 2025 | Benchmarking | CodeCode Available | 0 |
| Can Foundation Models Really Segment Tumors? A Benchmarking Odyssey in Lung CT Imaging | May 2, 2025 | BenchmarkingComputational Efficiency | —Unverified | 0 |
| Parameterized Argumentation-based Reasoning Tasks for Benchmarking Generative Language Models | May 2, 2025 | Benchmarking | CodeCode Available | 0 |
| Position: AI Competitions Provide the Gold Standard for Empirical Rigor in GenAI Evaluation | May 1, 2025 | BenchmarkingPosition | —Unverified | 0 |
| EnronQA: Towards Personalized RAG over Private Documents | May 1, 2025 | BenchmarkingMemorization | —Unverified | 0 |
| Vision Mamba in Remote Sensing: A Comprehensive Survey of Techniques, Applications and Outlook | May 1, 2025 | BenchmarkingChange Detection | CodeCode Available | 2 |
| InterLoc: LiDAR-based Intersection Localization using Road Segmentation with Automated Evaluation Method | May 1, 2025 | BenchmarkingMotion Planning | —Unverified | 0 |