| Challenges and Opportunities in Offline Reinforcement Learning from Visual Observations | Jun 9, 2022 | Benchmarkingcontinuous-control | CodeCode Available | 2 |
| COMPL-AI Framework: A Technical Interpretation and LLM Benchmarking Suite for the EU Artificial Intelligence Act | Oct 10, 2024 | BenchmarkingFairness | CodeCode Available | 2 |
| R-Judge: Benchmarking Safety Risk Awareness for LLM Agents | Jan 18, 2024 | Benchmarking | CodeCode Available | 2 |
| Benchmarking Robustness of 3D Point Cloud Recognition Against Common Corruptions | Jan 28, 2022 | 3D Point Cloud Classification3D Point Cloud Data Augmentation | CodeCode Available | 2 |
| Desbordante: from benchmarking suite to high-performance science-intensive data profiler (preprint) | Jan 14, 2023 | Benchmarking | CodeCode Available | 2 |
| Quanda: An Interpretability Toolkit for Training Data Attribution Evaluation and Beyond | Oct 9, 2024 | Benchmarking | CodeCode Available | 2 |
| COSMOS: Catching Out-of-Context Misinformation with Self-Supervised Learning | Jan 15, 2021 | BenchmarkingMisinformation | CodeCode Available | 1 |
| Category-wise Fine-Tuning: Resisting Incorrect Pseudo-Labels in Multi-Label Image Classification with Partial Labels | Jan 30, 2024 | Benchmarkingimage-classification | CodeCode Available | 1 |
| RADAR: Benchmarking Language Models on Imperfect Tabular Data | Jun 9, 2025 | BenchmarkingMissing Values | CodeCode Available | 1 |
| Application-Oriented Benchmarking of Quantum Generative Learning Using QUARK | Aug 8, 2023 | BenchmarkingGPU | CodeCode Available | 1 |