| Robust Latent Matters: Boosting Image Generation with Sampling Error | Mar 11, 2025 | BenchmarkingImage Generation | CodeCode Available | 3 |
| Ev-Layout: A Large-scale Event-based Multi-modal Dataset for Indoor Layout Estimation and Tracking | Mar 11, 2025 | Benchmarking | —Unverified | 0 |
| ResBench: Benchmarking LLM-Generated FPGA Designs with Resource Awareness | Mar 11, 2025 | BenchmarkingCode Generation | —Unverified | 0 |
| Integration of nested cross-validation, automated hyperparameter optimization, high-performance computing to reduce and quantify the variance of test performance estimation of deep learning models | Mar 11, 2025 | BenchmarkingHyperparameter Optimization | CodeCode Available | 0 |
| Towards Large Language Models that Benefit for All: Benchmarking Group Fairness in Reward Models | Mar 10, 2025 | AllBenchmarking | —Unverified | 0 |
| Benchmarking Chinese Medical LLMs: A Medbench-based Analysis of Performance Gaps and Hierarchical Optimization Strategies | Mar 10, 2025 | BenchmarkingEthics | —Unverified | 0 |
| MedAgentsBench: Benchmarking Thinking Models and Agent Frameworks for Complex Medical Reasoning | Mar 10, 2025 | BenchmarkingMedical Question Answering | CodeCode Available | 2 |
| Illuminating Darkness: Enhancing Real-world Low-light Scenes with Smartphone Images | Mar 10, 2025 | 4kBenchmarking | CodeCode Available | 1 |
| Skelite: Compact Neural Networks for Efficient Iterative Skeletonization | Mar 10, 2025 | BenchmarkingComputational Efficiency | CodeCode Available | 0 |
| Griffin: Aerial-Ground Cooperative Detection and Tracking Dataset and Benchmark | Mar 10, 2025 | Autonomous DrivingBenchmarking | CodeCode Available | 2 |