| A Neuro-Symbolic Framework for Sequence Classification with Relational and Temporal Knowledge | May 8, 2025 | Benchmarking | CodeCode Available | 0 |
| Autoregressive Stochastic Clock Jitter Compensation in Analog-to-Digital Converters | May 8, 2025 | Benchmarking | —Unverified | 0 |
| QualBench: Benchmarking Chinese LLMs with Localized Professional Qualifications for Vertical Domain Evaluation | May 8, 2025 | BenchmarkingFederated Learning | —Unverified | 0 |
| Enhancing Treatment Effect Estimation via Active Learning: A Counterfactual Covering Perspective | May 8, 2025 | Active LearningBenchmarking | CodeCode Available | 0 |
| Benchmarking Ophthalmology Foundation Models for Clinically Significant Age Macular Degeneration Detection | May 8, 2025 | BenchmarkingOut-of-Distribution Generalization | —Unverified | 0 |
| Software Development Life Cycle Perspective: A Survey of Benchmarks for Code Large Language Models and Agents | May 8, 2025 | Benchmarking | —Unverified | 0 |
| Advancing and Benchmarking Personalized Tool Invocation for LLMs | May 7, 2025 | BenchmarkingWorld Knowledge | CodeCode Available | 0 |
| Alpha Excel Benchmark | May 7, 2025 | Benchmarking | —Unverified | 0 |
| Benchmarking Traditional Machine Learning and Deep Learning Models for Fault Detection in Power Transformers | May 7, 2025 | BenchmarkingFault Detection | CodeCode Available | 0 |
| False Promises in Medical Imaging AI? Assessing Validity of Outperformance Claims | May 7, 2025 | Benchmarking | CodeCode Available | 0 |