| Joint Phase Shift Optimization and Precoder Selection for RIS-Assisted 5G NR MIMO Systems | May 29, 2025 | Benchmarking | —Unverified | 0 |
| LLM Performance for Code Generation on Noisy Tasks | May 29, 2025 | BenchmarkingCode Generation | CodeCode Available | 0 |
| Toward Memory-Aided World Models: Benchmarking via Spatial Consistency | May 29, 2025 | BenchmarkingMinecraft | CodeCode Available | 1 |
| VERINA: Benchmarking Verifiable Code Generation | May 29, 2025 | BenchmarkingCode Generation | CodeCode Available | 2 |
| Benchmarking Abstract and Reasoning Abilities Through A Theoretical Perspective | May 28, 2025 | BenchmarkingMemorization | CodeCode Available | 0 |
| RedTeamCUA: Realistic Adversarial Testing of Computer-Use Agents in Hybrid Web-OS Environments | May 28, 2025 | BenchmarkingRed Teaming | CodeCode Available | 1 |
| Scalable Parameter and Memory Efficient Pretraining for LLM: Recent Algorithmic Advances and Benchmarking | May 28, 2025 | Benchmarking | CodeCode Available | 1 |
| StarBASE-GP: Biologically-Guided Automated Machine Learning for Genotype-to-Phenotype Association Analysis | May 28, 2025 | Benchmarking | CodeCode Available | 0 |
| MEDAL: A Framework for Benchmarking LLMs as Multilingual Open-Domain Chatbots and Dialogue Evaluators | May 28, 2025 | BenchmarkingChatbot | CodeCode Available | 0 |
| Yambda-5B -- A Large-Scale Multi-modal Dataset for Ranking And Retrieval | May 28, 2025 | BenchmarkingRecommendation Systems | —Unverified | 0 |