SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

659,983 papers248,104 code links4,818 tasks

Papers

Showing 326350 of 177339 papers

TitleStatusHype
AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language ReasoningCode7
Pre^3: Enabling Deterministic Pushdown Automata for Faster Structured LLM GenerationCode7
DeepSeek-VL: Towards Real-World Vision-Language UnderstandingCode7
Vista: A Generalizable Driving World Model with High Fidelity and Versatile ControllabilityCode7
Grants4Companies: Applying Declarative Methods for Recommending and Reasoning About Business Grants in the Austrian Public Administration (System Description)Code7
InstantMesh: Efficient 3D Mesh Generation from a Single Image with Sparse-view Large Reconstruction ModelsCode7
PEER: Expertizing Domain-Specific Tasks with a Multi-Agent Framework and Tuning MethodsCode7
Code Generation with AlphaCodium: From Prompt Engineering to Flow EngineeringCode7
Dynamic Evaluation of Large Language Models by Meta Probing AgentsCode7
Better Synthetic Data by Retrieving and Transforming Existing DatasetsCode7
Metric3Dv2: A Versatile Monocular Geometric Foundation Model for Zero-shot Metric Depth and Surface Normal EstimationCode7
From RAG to Memory: Non-Parametric Continual Learning for Large Language ModelsCode7
AIOS Compiler: LLM as Interpreter for Natural Language Programming and Flow Programming of AI AgentsCode7
Demonstrate-Search-Predict: Composing retrieval and language models for knowledge-intensive NLPCode7
ECCO: Can We Improve Model-Generated Code Efficiency Without Sacrificing Functional Correctness?Code7
mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language ModelsCode7
MemoRAG: Moving towards Next-Gen RAG Via Memory-Inspired Knowledge DiscoveryCode7
PyRIT: A Framework for Security Risk Identification and Red Teaming in Generative AI SystemCode7
AutoTrain: No-code training for state-of-the-art modelsCode7
ThunderKittens: Simple, Fast, and Adorable AI KernelsCode7
Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement LearningCode7
InfiniteYou: Flexible Photo Recrafting While Preserving Your IdentityCode7
A Scalable Approach to Clustering Embedding ProjectionsCode7
Real-Time Video Generation with Pyramid Attention BroadcastCode7
Stable Audio OpenCode7
Show:102550
← PrevPage 14 of 7094Next →