| PocketVina Enables Scalable and Highly Accurate Physically Valid Docking through Multi-Pocket Conditioning | Jun 24, 2025 | BenchmarkingDrug Discovery | CodeCode Available | 2 |
| QHackBench: Benchmarking Large Language Models for Quantum Code Generation Using PennyLane Hackathon Challenges | Jun 24, 2025 | BenchmarkingCode Generation | —Unverified | 0 |
| Benchmarking histopathology foundation models in a multi-center dataset for skin cancer subtyping | Jun 23, 2025 | BenchmarkingDiversity | CodeCode Available | 0 |
| Generalizing Vision-Language Models to Novel Domains: A Comprehensive Survey | Jun 23, 2025 | BenchmarkingSurvey | —Unverified | 0 |
| Simulation-Based Sensitivity Analysis in Optimal Treatment Regimes and Causal Decomposition with Individualized Interventions | Jun 23, 2025 | BenchmarkingSensitivity | —Unverified | 0 |
| Staining normalization in histopathology: Method benchmarking using multicenter dataset | Jun 23, 2025 | Benchmarking | —Unverified | 0 |
| Survey of HPC in US Research Institutions | Jun 23, 2025 | BenchmarkingGPU | —Unverified | 0 |
| Benchmarking Music Generation Models and Metrics via Human Preference Studies | Jun 23, 2025 | BenchmarkingMusic Generation | —Unverified | 0 |
| Identifiable Convex-Concave Regression via Sub-gradient Regularised Least Squares | Jun 22, 2025 | Benchmarkingregression | —Unverified | 0 |
| Statistical Multicriteria Evaluation of LLM-Generated Text | Jun 22, 2025 | BenchmarkingDiversity | CodeCode Available | 0 |