| Optimizing Instructions and Demonstrations for Multi-Stage Language Model Programs | Jun 17, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 14 |
| SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering | May 6, 2024 | Bug fixingLanguage Modeling | CodeCode Available | 11 |
| Data Formulator 2: Iterative Creation of Data Visualizations, with AI Transforming Data Along the Way | Aug 28, 2024 | Code GenerationNavigate | CodeCode Available | 11 |
| UFO: A UI-Focused Agent for Windows OS Interaction | Feb 8, 2024 | Navigate | CodeCode Available | 9 |
| Mirage: A Multi-Level Superoptimizer for Tensor Programs | May 9, 2024 | GPUNavigate | CodeCode Available | 7 |
| Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution | Jul 12, 2023 | FairnessImage Classification | CodeCode Available | 6 |
| Training Compute-Optimal Large Language Models | Mar 29, 2022 | AnachronismsAnalogical Similarity | CodeCode Available | 6 |
| IntellAgent: A Multi-Agent Framework for Evaluating Conversational AI Systems | Jan 19, 2025 | Navigate | CodeCode Available | 5 |
| WebThinker: Empowering Large Reasoning Models with Deep Research Capability | Apr 30, 2025 | Navigate | CodeCode Available | 5 |
| AppAgent: Multimodal Agents as Smartphone Users | Dec 21, 2023 | Navigate | CodeCode Available | 5 |
| ChatDBG: Augmenting Debugging with Large Language Models | Mar 25, 2024 | C++ codeNavigate | CodeCode Available | 5 |
| DeepResearcher: Scaling Deep Research via Reinforcement Learning in Real-world Environments | Apr 4, 2025 | NavigatePrompt Engineering | CodeCode Available | 4 |
| VLN-R1: Vision-Language Navigation via Reinforcement Fine-Tuning | Jun 20, 2025 | NavigateVision-Language Navigation | CodeCode Available | 4 |
| LocAgent: Graph-Guided LLM Agents for Code Localization | Mar 12, 2025 | GitHub issue resolutionNavigate | CodeCode Available | 4 |
| RL4CO: an Extensive Reinforcement Learning for Combinatorial Optimization Benchmark | Jun 29, 2023 | Combinatorial OptimizationComputational Efficiency | CodeCode Available | 4 |
| GPUDrive: Data-driven, multi-agent driving simulation at 1 million FPS | Aug 2, 2024 | GPUNavigate | CodeCode Available | 4 |
| EvoX: A Distributed GPU-accelerated Framework for Scalable Evolutionary Computation | Jan 29, 2023 | GPUNavigate | CodeCode Available | 4 |
| Diffusion Models for Medical Image Analysis: A Comprehensive Survey | Nov 14, 2022 | DenoisingMedical Image Analysis | CodeCode Available | 4 |
| Computation-Efficient Era: A Comprehensive Survey of State Space Models in Medical Image Analysis | Jun 5, 2024 | MambaMedical Image Analysis | CodeCode Available | 3 |
| Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction | Dec 5, 2024 | Multimodal ReasoningNatural Language Visual Grounding | CodeCode Available | 3 |
| Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents | Oct 7, 2024 | Natural Language Visual GroundingNavigate | CodeCode Available | 3 |
| A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models | Jul 2, 2024 | Navigate | CodeCode Available | 3 |
| CarDreamer: Open-Source Learning Platform for World Model based Autonomous Driving | May 15, 2024 | Autonomous DrivingAutonomous Vehicles | CodeCode Available | 3 |
| From Automation to Autonomy: A Survey on Large Language Models in Scientific Discovery | May 19, 2025 | Navigatescientific discovery | CodeCode Available | 3 |
| DayDreamer: World Models for Physical Robot Learning | Jun 28, 2022 | Deep Reinforcement LearningNavigate | CodeCode Available | 2 |