SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

659,983 papers248,104 code links4,818 tasks

Papers

Showing 326350 of 659983 papers

TitleStatusHype
Pyramidal Flow Matching for Efficient Video Generative ModelingCode7
Marigold: Affordable Adaptation of Diffusion-Based Image Generators for Image AnalysisCode7
TextGrad: Automatic "Differentiation" via TextCode7
RAGEN: Understanding Self-Evolution in LLM Agents via Multi-Turn Reinforcement LearningCode7
Flow-GRPO: Training Flow Matching Models via Online RLCode7
AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language ReasoningCode7
Pre^3: Enabling Deterministic Pushdown Automata for Faster Structured LLM GenerationCode7
DeepSeek-VL: Towards Real-World Vision-Language UnderstandingCode7
Vista: A Generalizable Driving World Model with High Fidelity and Versatile ControllabilityCode7
Grants4Companies: Applying Declarative Methods for Recommending and Reasoning About Business Grants in the Austrian Public Administration (System Description)Code7
InstantMesh: Efficient 3D Mesh Generation from a Single Image with Sparse-view Large Reconstruction ModelsCode7
PEER: Expertizing Domain-Specific Tasks with a Multi-Agent Framework and Tuning MethodsCode7
Code Generation with AlphaCodium: From Prompt Engineering to Flow EngineeringCode7
Dynamic Evaluation of Large Language Models by Meta Probing AgentsCode7
Better Synthetic Data by Retrieving and Transforming Existing DatasetsCode7
Metric3Dv2: A Versatile Monocular Geometric Foundation Model for Zero-shot Metric Depth and Surface Normal EstimationCode7
From RAG to Memory: Non-Parametric Continual Learning for Large Language ModelsCode7
AIOS Compiler: LLM as Interpreter for Natural Language Programming and Flow Programming of AI AgentsCode7
Demonstrate-Search-Predict: Composing retrieval and language models for knowledge-intensive NLPCode7
ECCO: Can We Improve Model-Generated Code Efficiency Without Sacrificing Functional Correctness?Code7
mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language ModelsCode7
MemoRAG: Moving towards Next-Gen RAG Via Memory-Inspired Knowledge DiscoveryCode7
PyRIT: A Framework for Security Risk Identification and Red Teaming in Generative AI SystemCode7
AutoTrain: No-code training for state-of-the-art modelsCode7
The Road Less ScheduledCode7
Show:102550
← PrevPage 14 of 26400Next →