| Solving Urban Network Security Games: Learning Platform, Benchmark, and Challenge for AI Research | Jan 29, 2025 | Benchmarking | —Unverified | 0 |
| SafeRAG: Benchmarking Security in Retrieval-Augmented Generation of Large Language Model | Jan 28, 2025 | BenchmarkingLanguage Modeling | CodeCode Available | 2 |
| HateBench: Benchmarking Hate Speech Detectors on LLM-Generated Content and Hate Campaigns | Jan 28, 2025 | Adversarial AttackBenchmarking | CodeCode Available | 1 |
| Benchmarking Quantum Convolutional Neural Networks for Signal Classification in Simulated Gamma-Ray Burst Detection | Jan 28, 2025 | Benchmarking | —Unverified | 0 |
| Molecular-driven Foundation Model for Oncologic Pathology | Jan 28, 2025 | BenchmarkingDiagnostic | CodeCode Available | 4 |
| Making Sense of Data in the Wild: Data Analysis Automation at Scale | Jan 27, 2025 | BenchmarkingDiversity | —Unverified | 0 |
| A Benchmarking Environment for Worker Flexibility in Flexible Job Shop Scheduling Problems | Jan 27, 2025 | BenchmarkingEvolutionary Algorithms | —Unverified | 0 |
| PhysBench: Benchmarking and Enhancing Vision-Language Models for Physical World Understanding | Jan 27, 2025 | BenchmarkingCommon Sense Reasoning | —Unverified | 0 |
| IndicMMLU-Pro: Benchmarking Indic Large Language Models on Multi-Task Language Understanding | Jan 27, 2025 | BenchmarkingDiversity | —Unverified | 0 |
| Benchmarking Quantum Reinforcement Learning | Jan 27, 2025 | Benchmarkingreinforcement-learning | CodeCode Available | 0 |