| ResearchBench: Benchmarking LLMs in Scientific Discovery via Inspiration-Based Task Decomposition | Mar 27, 2025 | Benchmarkingscientific discovery | —Unverified | 0 |
| Iterative Hypothesis Generation for Scientific Discovery with Monte Carlo Nash Equilibrium Self-Refining Trees | Mar 25, 2025 | Large Language Modelscientific discovery | —Unverified | 0 |
| SCI-IDEA: Context-Aware Scientific Ideation Using Token and Sentence Embeddings | Mar 25, 2025 | scientific discoverySentence | —Unverified | 0 |
| Structuring Scientific Innovation: A Framework for Modeling and Discovering Impactful Knowledge Combinations | Mar 24, 2025 | Contrastive Learningscientific discovery | —Unverified | 0 |
| AgentRxiv: Towards Collaborative Autonomous Research | Mar 23, 2025 | Mathscientific discovery | CodeCode Available | 9 |
| Offline Model-Based Optimization: Comprehensive Review | Mar 21, 2025 | modelNeural Architecture Search | CodeCode Available | 1 |
| CodeScientist: End-to-End Semi-Automated Scientific Discovery with Code-based Experimentation | Mar 20, 2025 | Articlesscientific discovery | —Unverified | 0 |
| MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific Research | Mar 17, 2025 | ArticlesBenchmarking | CodeCode Available | 1 |
| Lessons from the trenches on evaluating machine-learning systems in materials science | Mar 13, 2025 | scientific discovery | —Unverified | 0 |
| SciHorizon: Benchmarking AI-for-Science Readiness from Scientific Data to Large Language Models | Mar 12, 2025 | BenchmarkingFairness | —Unverified | 0 |
| Representation Retrieval Learning for Heterogeneous Data Integration | Mar 12, 2025 | Data IntegrationMulti-Task Learning | —Unverified | 0 |
| Agentic AI for Scientific Discovery: A Survey of Progress, Challenges, and Future Directions | Mar 12, 2025 | Decision Makingscientific discovery | —Unverified | 0 |
| Accelerating Earth Science Discovery via Multi-Agent LLM Systems | Mar 7, 2025 | Diversityscientific discovery | —Unverified | 0 |
| Large Language Models for Zero-shot Inference of Causal Structures in Biology | Mar 6, 2025 | Articlesscientific discovery | —Unverified | 0 |
| Building Machine Learning Challenges for Anomaly Detection in Science | Mar 3, 2025 | Anomaly Detectionscientific discovery | —Unverified | 0 |
| Enabling AI Scientists to Recognize Innovation: A Domain-Agnostic Algorithm for Assessing Novelty | Mar 3, 2025 | scientific discovery | —Unverified | 0 |
| Can Large Language Models Help Experimental Design for Causal Discovery? | Mar 3, 2025 | Causal DiscoveryExperimental Design | —Unverified | 0 |
| BixBench: a Comprehensive Benchmark for LLM-based Agents in Computational Biology | Feb 28, 2025 | Multiple-choicescientific discovery | CodeCode Available | 2 |
| CS-PaperSum: A Large-Scale Dataset of AI-Generated Summaries for Scientific Papers | Feb 27, 2025 | Information RetrievalRetrieval | —Unverified | 0 |
| Can Language Models Falsify? Evaluating Algorithmic Reasoning with Counterexample Creation | Feb 26, 2025 | Ingenuityscientific discovery | CodeCode Available | 1 |
| Towards an AI co-scientist | Feb 26, 2025 | scientific discovery | —Unverified | 0 |
| A Perspective on Symbolic Machine Learning in Physical Sciences | Feb 25, 2025 | scientific discovery | —Unverified | 0 |
| Auto-Bench: An Automated Benchmark for Scientific Discovery in LLMs | Feb 21, 2025 | scientific discoveryvalid | —Unverified | 0 |
| Protein Large Language Models: A Comprehensive Survey | Feb 21, 2025 | ArticlesProtein Structure Prediction | CodeCode Available | 2 |
| InductionBench: LLMs Fail in the Simplest Complexity Class | Feb 20, 2025 | scientific discovery | CodeCode Available | 1 |