| Autonomous Microscopy Experiments through Large Language Model Agents | Dec 18, 2024 | BenchmarkingExperimental Design | CodeCode Available | 1 | 5 |
| EndoSLAM Dataset and An Unsupervised Monocular Visual Odometry and Depth Estimation Approach for Endoscopic Videos: Endo-SfMLearner | Jun 30, 2020 | BenchmarkingDepth Estimation | CodeCode Available | 1 | 5 |
| IOHanalyzer: Detailed Performance Analyses for Iterative Optimization Heuristics | Jul 8, 2020 | Bayesian OptimizationBenchmarking | CodeCode Available | 1 | 5 |
| RAD: A Comprehensive Dataset for Benchmarking the Robustness of Image Anomaly Detection | Jun 11, 2024 | Anomaly DetectionBenchmarking | CodeCode Available | 1 | 5 |
| Is Multi-Hop Reasoning Really Explainable? Towards Benchmarking Reasoning Interpretability | Apr 14, 2021 | BenchmarkingLink Prediction | CodeCode Available | 1 | 5 |
| Enhancing Biomedical Relation Extraction with Directionality | Jan 23, 2025 | BenchmarkingDocument-level Relation Extraction | CodeCode Available | 1 | 5 |
| JuDGE: Benchmarking Judgment Document Generation for Chinese Legal System | Mar 18, 2025 | BenchmarkingIn-Context Learning | CodeCode Available | 1 | 5 |
| Benchmarking Vision, Language, & Action Models on Robotic Learning Tasks | Nov 4, 2024 | Action GenerationBenchmarking | CodeCode Available | 1 | 5 |
| Benchpress: A Scalable and Versatile Workflow for Benchmarking Structure Learning Algorithms | Jul 8, 2021 | Benchmarking | CodeCode Available | 1 | 5 |
| BEND: Benchmarking DNA Language Models on biologically meaningful tasks | Nov 21, 2023 | BenchmarkingLanguage Modeling | CodeCode Available | 1 | 5 |