| MUPAX: Multidimensional Problem Agnostic eXplainable AI | Jul 17, 2025 | Anatomical Landmark DetectionAudio Classification | —Unverified | 0 |
| DVFL-Net: A Lightweight Distilled Video Focal Modulation Network for Spatio-Temporal Action Recognition | Jul 16, 2025 | BenchmarkingKnowledge Distillation | CodeCode Available | 0 |
| DCR: Quantifying Data Contamination in LLMs Evaluation | Jul 15, 2025 | Arithmetic ReasoningBenchmarking | CodeCode Available | 0 |
| A Multi-View High-Resolution Foot-Ankle Complex Point Cloud Dataset During Gait for Occlusion-Robust 3D Completion | Jul 15, 2025 | BenchmarkingPoint Cloud Completion | —Unverified | 0 |
| FLsim: A Modular and Library-Agnostic Simulation Framework for Federated Learning | Jul 15, 2025 | BenchmarkingFederated Learning | CodeCode Available | 0 |
| Benchmarking and Evaluation of AI Models in Biology: Outcomes and Recommendations from the CZI Virtual Cells Workshop | Jul 14, 2025 | Benchmarking | —Unverified | 0 |
| MLAR: Multi-layer Large Language Model-based Robotic Process Automation Applicant Tracking | Jul 14, 2025 | BenchmarkingLanguage Modeling | —Unverified | 0 |
| CodeJudgeBench: Benchmarking LLM-as-a-Judge for Coding Tasks | Jul 14, 2025 | BenchmarkingCode Generation | —Unverified | 0 |
| CodeAssistBench (CAB): Dataset & Benchmarking for Multi-turn Chat-Based Code Assistance | Jul 14, 2025 | BenchmarkingCode Generation | —Unverified | 0 |
| Ref-Long: Benchmarking the Long-context Referencing Capability of Long-context Language Models | Jul 13, 2025 | AttributeBenchmarking | CodeCode Available | 0 |