| SurvUnc: A Meta-Model Based Uncertainty Quantification Framework for Survival Analysis | May 20, 2025 | BenchmarkingModel Optimization | CodeCode Available | 0 |
| SATBench: Benchmarking LLMs' Logical Reasoning via Automated Puzzle Generation from SAT Formulas | May 20, 2025 | BenchmarkingLogical Reasoning | —Unverified | 0 |
| Benchmarking and Confidence Evaluation of LALMs For Temporal Reasoning | May 19, 2025 | Benchmarking | CodeCode Available | 0 |
| LEXam: Benchmarking Legal Reasoning on 340 Law Exams | May 19, 2025 | BenchmarkingLegal Reasoning | —Unverified | 0 |
| CURE: Concept Unlearning via Orthogonal Representation Editing in Diffusion Models | May 19, 2025 | BenchmarkingRed Teaming | —Unverified | 0 |
| Graph Alignment for Benchmarking Graph Neural Networks and Learning Positional Encodings | May 19, 2025 | BenchmarkingCombinatorial Optimization | —Unverified | 0 |
| Ice Cream Doesn't Cause Drowning: Benchmarking LLMs Against Statistical Pitfalls in Causal Inference | May 19, 2025 | BenchmarkingCausal Inference | —Unverified | 0 |
| SzCORE as a benchmark: report from the seizure detection challenge at the 2025 AI in Epilepsy and Neurological Disorders Conference | May 19, 2025 | BenchmarkingEEG | —Unverified | 0 |
| Benchmarking Unified Face Attack Detection via Hierarchical Prompt Tuning | May 19, 2025 | Benchmarking | —Unverified | 0 |
| A Comprehensive Benchmarking Platform for Deep Generative Models in Molecular Design | May 19, 2025 | BenchmarkingDrug Discovery | —Unverified | 0 |