SOTAVerified

Navigate

Papers

Showing 125 of 1982 papers

TitleStatusHype
Optimizing Instructions and Demonstrations for Multi-Stage Language Model ProgramsCode14
Data Formulator 2: Iterative Creation of Data Visualizations, with AI Transforming Data Along the WayCode11
SWE-agent: Agent-Computer Interfaces Enable Automated Software EngineeringCode11
UFO: A UI-Focused Agent for Windows OS InteractionCode9
Mirage: A Multi-Level Superoptimizer for Tensor ProgramsCode7
Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and ResolutionCode6
Training Compute-Optimal Large Language ModelsCode6
WebThinker: Empowering Large Reasoning Models with Deep Research CapabilityCode5
IntellAgent: A Multi-Agent Framework for Evaluating Conversational AI SystemsCode5
ChatDBG: Augmenting Debugging with Large Language ModelsCode5
AppAgent: Multimodal Agents as Smartphone UsersCode5
VLN-R1: Vision-Language Navigation via Reinforcement Fine-TuningCode4
DeepResearcher: Scaling Deep Research via Reinforcement Learning in Real-world EnvironmentsCode4
LocAgent: Graph-Guided LLM Agents for Code LocalizationCode4
GPUDrive: Data-driven, multi-agent driving simulation at 1 million FPSCode4
RL4CO: an Extensive Reinforcement Learning for Combinatorial Optimization BenchmarkCode4
EvoX: A Distributed GPU-accelerated Framework for Scalable Evolutionary ComputationCode4
Diffusion Models for Medical Image Analysis: A Comprehensive SurveyCode4
From Automation to Autonomy: A Survey on Large Language Models in Scientific DiscoveryCode3
Aguvis: Unified Pure Vision Agents for Autonomous GUI InteractionCode3
Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI AgentsCode3
A Practical Review of Mechanistic Interpretability for Transformer-Based Language ModelsCode3
Computation-Efficient Era: A Comprehensive Survey of State Space Models in Medical Image AnalysisCode3
CarDreamer: Open-Source Learning Platform for World Model based Autonomous DrivingCode3
AI Research Agents for Machine Learning: Search, Exploration, and Generalization in MLE-benchCode2
Show:102550
← PrevPage 1 of 80Next →

No leaderboard results yet.