| Benchmarking Robustness and Generalization in Multi-Agent Systems: A Case Study on Neural MMO | Aug 30, 2023 | BenchmarkingReinforcement Learning (RL) | —Unverified | 0 |
| Benchmarking Robot Manipulation with the Rubik's Cube | Feb 14, 2022 | BenchmarkingRobot Manipulation | —Unverified | 0 |
| A Comprehensive Multi-Illuminant Dataset for Benchmarking of the Intrinsic Image Algorithms | Dec 1, 2015 | BenchmarkingImage Generation | —Unverified | 0 |
| Benchmarking Retrieval-Augmented Large Language Models in Biomedical NLP: Application, Robustness, and Self-Awareness | May 13, 2024 | Benchmarkingcounterfactual | —Unverified | 0 |
| A Systematic Analysis of Hybrid Linear Attention | Jul 8, 2025 | BenchmarkingLanguage Modeling | —Unverified | 0 |
| Benchmarking Retrieval-Augmented Generation for Chemistry | May 12, 2025 | BenchmarkingRAG | —Unverified | 0 |
| Self-Aligning Depth-regularized Radiance Fields for Asynchronous RGB-D Sequences | Nov 14, 2022 | Autonomous DrivingBenchmarking | —Unverified | 0 |
| Airport Capacity and Performance in Europe -- A study of transport economics, service quality and sustainability | Feb 4, 2021 | Benchmarking | —Unverified | 0 |
| Benchmarking Resource Usage for Efficient Distributed Deep Learning | Jan 28, 2022 | BenchmarkingDeep Learning | —Unverified | 0 |
| Goal-Driven Sequential Data Abstraction | Jul 29, 2019 | BenchmarkingGeneral Reinforcement Learning | —Unverified | 0 |
| A Survey on Vision Autoregressive Model | Nov 13, 2024 | 3D GenerationBenchmarking | —Unverified | 0 |
| A Survey on Temporal Sentence Grounding in Videos | Sep 16, 2021 | Action LocalizationBenchmarking | —Unverified | 0 |
| Benchmarking Reinforcement Learning Methods for Dexterous Robotic Manipulation with a Three-Fingered Gripper | Aug 27, 2024 | BenchmarkingReinforcement Learning (RL) | —Unverified | 0 |
| 4Seasons: Benchmarking Visual SLAM and Long-Term Localization for Autonomous Driving in Challenging Conditions | Dec 31, 2022 | Autonomous DrivingBenchmarking | —Unverified | 0 |
| Domain Adaptation with Joint Learning for Generic, Optical Car Part Recognition and Detection Systems (Go-CaRD) | Jun 15, 2020 | BenchmarkingDomain Adaptation | —Unverified | 0 |
| GoodDrag: Towards Good Practices for Drag Editing with Diffusion Models | Apr 10, 2024 | BenchmarkingDenoising | —Unverified | 0 |
| Graph Alignment for Benchmarking Graph Neural Networks and Learning Positional Encodings | May 19, 2025 | BenchmarkingCombinatorial Optimization | —Unverified | 0 |
| Greening AI-enabled Systems with Software Engineering: A Research Agenda for Environmentally Sustainable AI Practices | Jun 2, 2025 | Benchmarking | —Unverified | 0 |
| Helsinki Deblur Challenge 2021: description of photographic data | May 21, 2021 | BenchmarkingDeblurring | —Unverified | 0 |
| A Survey on Semi-Supervised Learning for Delayed Partially Labelled Data Streams | Jun 16, 2021 | Active LearningBenchmarking | —Unverified | 0 |
| A Survey on Preserving Fairness Guarantees in Changing Environments | Nov 14, 2022 | BenchmarkingDecision Making | —Unverified | 0 |
| Benchmarking Reasoning Robustness in Large Language Models | Mar 6, 2025 | BenchmarkingMath | —Unverified | 0 |
| Benchmarking real-time monitoring strategies for ethanol production from lignocellulosic biomass | Jan 29, 2021 | Benchmarking | —Unverified | 0 |
| Global Wheat Head Dataset 2021: more diversity to improve the benchmarking of wheat head localization methods | May 17, 2021 | BenchmarkingDiversity | —Unverified | 0 |
| Feasibility of BERT Embeddings For Domain-Specific Knowledge Mining | Jan 16, 2022 | BenchmarkingLanguage Modelling | —Unverified | 0 |