| A Metadata-Driven Approach to Understand Graph Neural Networks | Oct 30, 2023 | BenchmarkingGraph Learning | —Unverified | 0 |
| FixCLR: Negative-Class Contrastive Learning for Semi-Supervised Domain Generalization | Jun 25, 2025 | BenchmarkingContrastive Learning | —Unverified | 0 |
| BenchMARL: Benchmarking Multi-Agent Reinforcement Learning | Dec 3, 2023 | BenchmarkingMulti-agent Reinforcement Learning | —Unverified | 0 |
| BAGELS: Benchmarking the Automated Generation and Extraction of Limitations from Scholarly Text | May 22, 2025 | BenchmarkingRAG | —Unverified | 0 |
| ACT-Bench: Towards Action Controllable World Models for Autonomous Driving | Dec 6, 2024 | Autonomous DrivingBenchmarking | —Unverified | 0 |
| Fine-tuning LLaMA 2 interference: a comparative study of language implementations for optimal efficiency | Jan 30, 2025 | BenchmarkingLanguage Modeling | —Unverified | 0 |
| Benchmarks as Microscopes: A Call for Model Metrology | Jul 22, 2024 | Benchmarkingmodel | —Unverified | 0 |
| FineText: Text Classification via Attention-based Language Model Fine-tuning | Oct 25, 2019 | BenchmarkingClassification | —Unverified | 0 |
| Benchmark of Segmentation Techniques for Pelvic Fracture in CT and X-ray: Summary of the PENGWIN 2024 Challenge | Apr 3, 2025 | AnatomyBenchmarking | —Unverified | 0 |
| FinGPT: Instruction Tuning Benchmark for Open-Source Large Language Models in Financial Datasets | Oct 7, 2023 | Benchmarkingnamed-entity-recognition | —Unverified | 0 |