| Unraveling the Capabilities of Language Models in News Summarization | Jan 30, 2025 | BenchmarkingFew-Shot Learning | CodeCode Available | 0 |
| mTSBench: Benchmarking Multivariate Time Series Anomaly Detection and Model Selection at Scale | Jun 26, 2025 | Anomaly DetectionBenchmarking | CodeCode Available | 0 |
| FedNLP: Benchmarking Federated Learning Methods for Natural Language Processing Tasks | Apr 18, 2021 | BenchmarkingFederated Learning | CodeCode Available | 0 |
| MUBen: Benchmarking the Uncertainty of Molecular Representation Models | Jun 14, 2023 | BenchmarkingDrug Discovery | CodeCode Available | 0 |
| The Sounds of Home: A Speech-Removed Residential Audio Dataset for Sound Event Detection | Sep 17, 2024 | BenchmarkingEvent Detection | CodeCode Available | 0 |
| WAC: A Corpus of Wikipedia Conversations for Online Abuse Detection | Mar 13, 2020 | Abuse DetectionBenchmarking | CodeCode Available | 0 |
| FedSecurity: Benchmarking Attacks and Defenses in Federated Learning and Federated LLMs | Jun 8, 2023 | BenchmarkingFederated Learning | CodeCode Available | 0 |
| Fedivertex: a Graph Dataset based on Decentralized Social Networks for Trustworthy Machine Learning | May 27, 2025 | Benchmarking | CodeCode Available | 0 |
| Feature interpretability in BCIs: exploring the role of network lateralization | Jul 16, 2024 | BenchmarkingEEG | CodeCode Available | 0 |
| AutoBench-V: Can Large Vision-Language Models Benchmark Themselves? | Oct 28, 2024 | BenchmarkingQuestion Answering | CodeCode Available | 0 |