| NetPress: Dynamically Generated LLM Benchmarks for Network Applications | Jun 3, 2025 | Benchmarking | CodeCode Available | 1 |
| Rethinking Machine Unlearning in Image Generation Models | Jun 3, 2025 | BenchmarkingImage Generation | CodeCode Available | 1 |
| CVC: A Large-Scale Chinese Value Rule Corpus for Value Alignment of Large Language Models | Jun 2, 2025 | Benchmarking | CodeCode Available | 0 |
| FormFactory: An Interactive Benchmarking Suite for Multimodal Form-Filling Agents | Jun 2, 2025 | BenchmarkingForm | —Unverified | 0 |
| TIIF-Bench: How Does Your T2I Model Follow Your Instructions? | Jun 2, 2025 | BenchmarkingInstruction Following | —Unverified | 0 |
| Benchmarking Neural Speech Codec Intelligibility with SITool | Jun 2, 2025 | BenchmarkingDiagnostic | —Unverified | 0 |
| ResearchCodeBench: Benchmarking LLMs on Implementing Novel Machine Learning Research Code | Jun 2, 2025 | BenchmarkingCode Generation | —Unverified | 0 |
| ExpertLongBench: Benchmarking Language Models on Expert-Level Long-Form Generation Tasks with Structured Checklists | Jun 2, 2025 | BenchmarkingForm | —Unverified | 0 |
| Greening AI-enabled Systems with Software Engineering: A Research Agenda for Environmentally Sustainable AI Practices | Jun 2, 2025 | Benchmarking | —Unverified | 0 |
| GSCodec Studio: A Modular Framework for Gaussian Splat Compression | Jun 2, 2025 | Benchmarking | CodeCode Available | 2 |