| Enabling AI Scientists to Recognize Innovation: A Domain-Agnostic Algorithm for Assessing Novelty | Mar 3, 2025 | scientific discovery | —Unverified | 0 |
| Can Large Language Models Help Experimental Design for Causal Discovery? | Mar 3, 2025 | Causal DiscoveryExperimental Design | —Unverified | 0 |
| BixBench: a Comprehensive Benchmark for LLM-based Agents in Computational Biology | Feb 28, 2025 | Multiple-choicescientific discovery | CodeCode Available | 2 |
| CS-PaperSum: A Large-Scale Dataset of AI-Generated Summaries for Scientific Papers | Feb 27, 2025 | Information RetrievalRetrieval | —Unverified | 0 |
| Can Language Models Falsify? Evaluating Algorithmic Reasoning with Counterexample Creation | Feb 26, 2025 | Ingenuityscientific discovery | CodeCode Available | 1 |
| Towards an AI co-scientist | Feb 26, 2025 | scientific discovery | —Unverified | 0 |
| A Perspective on Symbolic Machine Learning in Physical Sciences | Feb 25, 2025 | scientific discovery | —Unverified | 0 |
| Auto-Bench: An Automated Benchmark for Scientific Discovery in LLMs | Feb 21, 2025 | scientific discoveryvalid | —Unverified | 0 |
| Protein Large Language Models: A Comprehensive Survey | Feb 21, 2025 | ArticlesProtein Structure Prediction | CodeCode Available | 2 |
| InductionBench: LLMs Fail in the Simplest Complexity Class | Feb 20, 2025 | scientific discovery | CodeCode Available | 1 |