| MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models | Apr 4, 2025 | BenchmarkingImage Generation | —Unverified | 0 |
| Quantifying Robustness: A Benchmarking Framework for Deep Learning Forecasting in Cyber-Physical Systems | Apr 4, 2025 | BenchmarkingModel Selection | CodeCode Available | 0 |
| Towards a Unified Framework for Determining Conformational Ensembles of Disordered Proteins | Apr 4, 2025 | Benchmarking | —Unverified | 0 |
| Do LLM Evaluators Prefer Themselves for a Reason? | Apr 4, 2025 | BenchmarkingCode Generation | CodeCode Available | 0 |
| Can AI Master Construction Management (CM)? Benchmarking State-of-the-Art Large Language Models on CM Certification Exams | Apr 4, 2025 | BenchmarkingManagement | —Unverified | 0 |
| Sustainable LLM Inference for Edge AI: Evaluating Quantized LLMs for Energy Efficiency, Output Accuracy, and Inference Latency | Apr 4, 2025 | BenchmarkingGSM8K | —Unverified | 0 |
| Detecting Stereotypes and Anti-stereotypes the Correct Way Using Social Psychological Underpinnings | Apr 4, 2025 | Benchmarking | CodeCode Available | 0 |
| Point Cloud Objective Quality: Benchmarking Features and Quality Evaluation | Apr 4, 2025 | AttributeBenchmarking | —Unverified | 0 |
| Evaluating AI Recruitment Sourcing Tools by Human Preference | Apr 3, 2025 | Benchmarking | CodeCode Available | 0 |
| Benchmark of Segmentation Techniques for Pelvic Fracture in CT and X-ray: Summary of the PENGWIN 2024 Challenge | Apr 3, 2025 | AnatomyBenchmarking | —Unverified | 0 |