| STEP: A Unified Spiking Transformer Evaluation Platform for Fair and Reproducible Benchmarking | May 16, 2025 | Benchmarking | CodeCode Available | 0 |
| CleanPatrick: A Benchmark for Image Data Cleaning | May 16, 2025 | BenchmarkingLabel Error Detection | CodeCode Available | 0 |
| Visual Anomaly Detection under Complex View-Illumination Interplay: A Large-Scale Benchmark | May 16, 2025 | Anomaly DetectionBenchmarking | —Unverified | 0 |
| Relation Extraction Across Entire Books to Reconstruct Community Networks: The AffilKG Datasets | May 16, 2025 | BenchmarkingKnowledge Graphs | —Unverified | 0 |
| Benchmarking performance, explainability, and evaluation strategies of vision-language models for surgery: Challenges and opportunities | May 16, 2025 | Benchmarking | —Unverified | 0 |
| Benchmarking Spatiotemporal Reasoning in LLMs and Reasoning Models: Capabilities and Challenges | May 16, 2025 | BenchmarkingState Estimation | CodeCode Available | 0 |
| TCC-Bench: Benchmarking the Traditional Chinese Culture Understanding Capabilities of MLLMs | May 16, 2025 | BenchmarkingQuestion Answering | CodeCode Available | 0 |
| MedGUIDE: Benchmarking Clinical Decision-Making in Large Language Models | May 16, 2025 | BenchmarkingDecision Making | —Unverified | 0 |
| ASR-FAIRBENCH: Measuring and Benchmarking Equity Across Speech Recognition Systems | May 16, 2025 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| JointDistill: Adaptive Multi-Task Distillation for Joint Depth Estimation and Scene Segmentation | May 15, 2025 | BenchmarkingDepth Estimation | —Unverified | 0 |