| Unreal Robotics Lab: A High-Fidelity Robotics Simulator with Advanced Physics and Rendering | Apr 19, 2025 | BenchmarkingDataset Generation | —Unverified | 0 |
| Know Me, Respond to Me: Benchmarking LLMs for Dynamic User Profiling and Personalized Responses at Scale | Apr 19, 2025 | Benchmarking | CodeCode Available | 2 |
| CodeCrash: Stress Testing LLM Reasoning under Structural and Semantic Perturbations | Apr 19, 2025 | Benchmarking | —Unverified | 0 |
| AI Idea Bench 2025: AI Research Idea Generation Benchmark | Apr 19, 2025 | Benchmarkingscientific discovery | —Unverified | 0 |
| Integrated Super-resolution Sensing and Symbiotic Communication with 3D Sparse MIMO for Low-Altitude UAV Swarm | Apr 18, 2025 | BenchmarkingSuper-Resolution | —Unverified | 0 |
| OpenDeception: Benchmarking and Investigating AI Deceptive Behaviors via Open-ended Interaction Simulation | Apr 18, 2025 | Benchmarking | —Unverified | 0 |
| THOUGHTTERMINATOR: Benchmarking, Calibrating, and Mitigating Overthinking in Reasoning Models | Apr 17, 2025 | BenchmarkingMath | —Unverified | 0 |
| Benchmarking LLM-based Relevance Judgment Methods | Apr 17, 2025 | BenchmarkingOpen-Domain Question Answering | CodeCode Available | 0 |
| Benchmarking Multi-National Value Alignment for Large Language Models | Apr 17, 2025 | Benchmarking | —Unverified | 0 |
| Enhancing Explainability and Reliable Decision-Making in Particle Swarm Optimization through Communication Topologies | Apr 17, 2025 | BenchmarkingDecision Making | —Unverified | 0 |