| CVC: A Large-Scale Chinese Value Rule Corpus for Value Alignment of Large Language Models | Jun 2, 2025 | Benchmarking | CodeCode Available | 0 |
| Benchmarking Neural Speech Codec Intelligibility with SITool | Jun 2, 2025 | BenchmarkingDiagnostic | —Unverified | 0 |
| ResearchCodeBench: Benchmarking LLMs on Implementing Novel Machine Learning Research Code | Jun 2, 2025 | BenchmarkingCode Generation | —Unverified | 0 |
| ACCESS DENIED INC: The First Benchmark Environment for Sensitivity Awareness | Jun 1, 2025 | BenchmarkingManagement | CodeCode Available | 0 |
| MedBookVQA: A Systematic and Comprehensive Medical Benchmark Derived from Open-Access Book | Jun 1, 2025 | Benchmarking | CodeCode Available | 0 |
| ModuLM: Enabling Modular and Multimodal Molecular Relational Learning with Large Language Models | Jun 1, 2025 | BenchmarkingRelational Reasoning | —Unverified | 0 |
| The iNaturalist Sounds Dataset | May 31, 2025 | Benchmarking | —Unverified | 0 |
| Benchmarking Foundation Models for Zero-Shot Biometric Tasks | May 30, 2025 | AttributeBenchmarking | —Unverified | 0 |
| GenSpace: Benchmarking Spatially-Aware Image Generation | May 30, 2025 | BenchmarkingImage Generation | —Unverified | 0 |
| Progressive Class-level Distillation | May 30, 2025 | BenchmarkingKnowledge Distillation | —Unverified | 0 |