| BERT-based Chinese Text Classification for Emergency Domain with a Novel Loss Function | Apr 9, 2021 | BenchmarkingGeneral Classification | —Unverified | 0 | 0 |
| AgoraSpeech: A multi-annotated comprehensive dataset of political discourse through the lens of humans and AI | Jan 9, 2025 | Benchmarkingnamed-entity-recognition | —Unverified | 0 | 0 |
| Benefits and Challenges of Dynamic Modelling of Cascading Failures in Power Systems | Jul 7, 2022 | Benchmarking | —Unverified | 0 | 0 |
| AGIBench: A Multi-granularity, Multimodal, Human-referenced, Auto-scoring Benchmark for Large Language Models | Sep 5, 2023 | BenchmarkingZero-Shot Learning | —Unverified | 0 | 0 |
| Bench to the Future: A Pastcasting Benchmark for Forecasting Agents | Jun 11, 2025 | Benchmarking | —Unverified | 0 | 0 |
| BenchMARL: Benchmarking Multi-Agent Reinforcement Learning | Dec 3, 2023 | BenchmarkingMulti-agent Reinforcement Learning | —Unverified | 0 | 0 |
| gSuite: A Flexible and Framework Independent Benchmark Suite for Graph Neural Network Inference on GPUs | Oct 20, 2022 | BenchmarkingComputational Efficiency | —Unverified | 0 | 0 |
| GTP-4o: Modality-prompted Heterogeneous Graph Learning for Omni-modal Biomedical Representation | Jul 8, 2024 | BenchmarkingGraph Embedding | —Unverified | 0 | 0 |
| Benchmarks as Microscopes: A Call for Model Metrology | Jul 22, 2024 | Benchmarkingmodel | —Unverified | 0 | 0 |
| The Curious Case of Integrator Reach Sets, Part I: Basic Theory | Feb 23, 2021 | Benchmarking | —Unverified | 0 | 0 |