| Beyond Self-Talk: A Communication-Centric Survey of LLM-Based Multi-Agent Systems | Feb 20, 2025 | BenchmarkingDecision Making | —Unverified | 0 | 0 |
| The Benchmark Lottery | Jul 14, 2021 | BenchmarkingBIG-bench Machine Learning | —Unverified | 0 | 0 |
| Global Rice Multi-Class Segmentation Dataset (RiceSEG): A Comprehensive and Diverse High-Resolution RGB-Annotated Images for the Development and Benchmarking of Rice Segmentation Algorithms | Apr 2, 2025 | BenchmarkingSemantic Segmentation | —Unverified | 0 | 0 |
| Global Wheat Head Dataset 2021: more diversity to improve the benchmarking of wheat head localization methods | May 17, 2021 | BenchmarkingDiversity | —Unverified | 0 | 0 |
| Beyond Monocular Deraining: Stereo Image Deraining via Semantic Understanding | Aug 1, 2020 | BenchmarkingRain Removal | —Unverified | 0 | 0 |
| GLOVER++: Unleashing the Potential of Affordance Learning from Human Behaviors for Robotic Manipulation | May 17, 2025 | Benchmarking | —Unverified | 0 | 0 |
| GNNBENCH: Fair and Productive Benchmarking for Single-GPU GNN System | Apr 5, 2024 | BenchmarkingGPU | —Unverified | 0 | 0 |
| A Benchmark for Multi-speaker Anonymization | Jul 8, 2024 | BenchmarkingDisentanglement | —Unverified | 0 | 0 |
| Beyond Monocular Deraining: Parallel Stereo Deraining Network Via Semantic Prior | May 9, 2021 | BenchmarkingRain Removal | —Unverified | 0 | 0 |
| Beyond Metrics: A Critical Analysis of the Variability in Large Language Model Evaluation Frameworks | Jul 29, 2024 | BenchmarkingLanguage Model Evaluation | —Unverified | 0 | 0 |