| Quantifying the Reasoning Abilities of LLMs on Real-world Clinical Cases | Mar 6, 2025 | BenchmarkingDiagnostic | CodeCode Available | 0 |
| InfoSEM: A Deep Generative Model with Informative Priors for Gene Regulatory Network Inference | Mar 6, 2025 | Benchmarking | —Unverified | 0 |
| Dynamic Benchmarking of Reasoning Capabilities in Code Large Language Models Under Data Contamination | Mar 6, 2025 | Benchmarking | —Unverified | 0 |
| ThrowBench: Benchmarking LLMs by Predicting Runtime Exceptions | Mar 6, 2025 | BenchmarkingHumanEval | CodeCode Available | 0 |
| Know Thy Judge: On the Robustness Meta-Evaluation of LLM Safety Judges | Mar 6, 2025 | BenchmarkingLanguage Modeling | —Unverified | 0 |
| Eventprop training for efficient neuromorphic applications | Mar 6, 2025 | BenchmarkingGPU | —Unverified | 0 |
| Towards Universal Learning-based Model for Cardiac Image Reconstruction: Summary of the CMRxRecon2024 Challenge | Mar 5, 2025 | BenchmarkingImage Reconstruction | —Unverified | 0 |
| UnPuzzle: A Unified Framework for Pathology Image Analysis | Mar 5, 2025 | BenchmarkingDiagnostic | CodeCode Available | 1 |
| GNNMerge: Merging of GNN Models Without Accessing Training Data | Mar 5, 2025 | BenchmarkingComputational Efficiency | CodeCode Available | 0 |
| AttackSeqBench: Benchmarking Large Language Models' Understanding of Sequential Patterns in Cyber Attacks | Mar 5, 2025 | Benchmarkinggraph construction | CodeCode Available | 0 |