| Better than classical? The subtle art of benchmarking quantum machine learning models | Mar 11, 2024 | BenchmarkingBinary Classification | CodeCode Available | 7 |
| Can Generalist Foundation Models Outcompete Special-Purpose Tuning? Case Study in Medicine | Nov 28, 2023 | Electrical EngineeringExperimental Design | CodeCode Available | 5 |
| Chain of Ideas: Revolutionizing Research Via Novel Idea Development with LLM Agents | Oct 17, 2024 | Experimental Design | CodeCode Available | 4 |
| NNsight and NDIF: Democratizing Access to Open-Weight Foundation Model Internals | Jul 18, 2024 | Experimental DesignGPU | CodeCode Available | 4 |
| Predicting from Strings: Language Model Embeddings for Bayesian Optimization | Oct 14, 2024 | Bayesian OptimizationExperimental Design | CodeCode Available | 3 |
| Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers | Sep 6, 2024 | Experimental Designscientific discovery | CodeCode Available | 3 |
| OmniPred: Language Models as Universal Regressors | Feb 22, 2024 | Experimental Designregression | CodeCode Available | 3 |
| Attention is not not Explanation | Aug 13, 2019 | Decision MakingDiagnostic | CodeCode Available | 3 |
| Reviving The Classics: Active Reward Modeling in Large Language Model Alignment | Feb 4, 2025 | Computational EfficiencyExperimental Design | CodeCode Available | 2 |
| Honegumi: An Interface for Accelerating the Adoption of Bayesian Optimization in the Experimental Sciences | Feb 4, 2025 | Bayesian OptimizationExperimental Design | CodeCode Available | 2 |
| Probing the limitations of multimodal language models for chemistry and materials research | Nov 25, 2024 | Experimental DesignSpatial Reasoning | CodeCode Available | 2 |
| Many Heads Are Better Than One: Improved Scientific Idea Generation by A LLM-Based Multi-Agent System | Oct 12, 2024 | Experimental Designscientific discovery | CodeCode Available | 2 |
| OpenBox: A Python Toolkit for Generalized Black-box Optimization | Apr 26, 2023 | Experimental Design | CodeCode Available | 2 |
| hls4ml: An Open-Source Codesign Workflow to Empower Scientific Low-Power Machine Learning Devices | Mar 9, 2021 | BIG-bench Machine LearningDiagnostic | CodeCode Available | 2 |
| BoTorch: A Framework for Efficient Monte-Carlo Bayesian Optimization | Oct 14, 2019 | Bayesian OptimisationBayesian Optimization | CodeCode Available | 2 |
| A friendly introduction to triangular transport | Mar 27, 2025 | Bayesian InferenceDecision Making | CodeCode Available | 1 |
| Gemstones: A Model Suite for Multi-Faceted Scaling Laws | Feb 7, 2025 | Experimental DesignLanguage Modeling | CodeCode Available | 1 |
| Active Task Disambiguation with LLMs | Feb 6, 2025 | Experimental DesignQuestion Selection | CodeCode Available | 1 |
| Autonomous Microscopy Experiments through Large Language Model Agents | Dec 18, 2024 | BenchmarkingExperimental Design | CodeCode Available | 1 |
| Confident Teacher, Confident Student? A Novel User Study Design for Investigating the Didactic Potential of Explanations and their Impact on Uncertainty | Sep 10, 2024 | Experimental DesignExplainable artificial intelligence | CodeCode Available | 1 |
| Evaluating Multiview Object Consistency in Humans and Image Models | Sep 9, 2024 | Experimental Design | CodeCode Available | 1 |
| Toward Automated Simulation Research Workflow through LLM Prompt Engineering Design | Aug 28, 2024 | Experimental DesignPrompt Engineering | CodeCode Available | 1 |
| GitHub is an effective platform for collaborative and reproducible laboratory research | Aug 18, 2024 | Experimental DesignTransfer Learning | CodeCode Available | 1 |
| SoK: Membership Inference Attacks on LLMs are Rushing Nowhere (and How to Fix It) | Jun 25, 2024 | BenchmarkingExperimental Design | CodeCode Available | 1 |
| Can ChatGPT Detect DeepFakes? A Study of Using Multimodal Large Language Models for Media Forensics | Mar 21, 2024 | DeepFake DetectionExperimental Design | CodeCode Available | 1 |