| LMR-BENCH: Evaluating LLM Agent's Ability on Reproducing Language Modeling Research | Jun 19, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 1 | 5 |
| MLR-Bench: Evaluating AI Agents on Open-Ended Machine Learning Research | May 26, 2025 | scientific discovery | CodeCode Available | 1 | 5 |
| BLADE: Benchmarking Language Model Agents for Data-Driven Science | Aug 19, 2024 | BenchmarkingDecision Making | CodeCode Available | 1 | 5 |
| AbductionRules: Training Transformers to Explain Unexpected Inputs | Mar 23, 2022 | Common Sense ReasoningLogical Reasoning | CodeCode Available | 1 | 5 |
| Offline Model-Based Optimization: Comprehensive Review | Mar 21, 2025 | modelNeural Architecture Search | CodeCode Available | 1 | 5 |
| LGEM^+: a first-order logic framework for automated improvement of metabolic network models through abduction | Jun 9, 2023 | scientific discovery | CodeCode Available | 0 | 5 |
| Leveraging large language models for nano synthesis mechanism explanation: solid foundations or mere conjectures? | Jul 12, 2024 | Logical ReasoningMultiple-choice | CodeCode Available | 0 | 5 |
| APACE: AlphaFold2 and advanced computing as a service for accelerated discovery in biophysics | Aug 15, 2023 | Drug DiscoveryPrediction | CodeCode Available | 0 | 5 |
| Learning to discover: expressive Gaussian mixture models for multi-dimensional simulation and parameter inference in the physical sciences | Aug 25, 2021 | scientific discovery | CodeCode Available | 0 | 5 |
| An LLM-based Knowledge Synthesis and Scientific Reasoning Framework for Biomedical Discovery | Jun 26, 2024 | scientific discovery | CodeCode Available | 0 | 5 |