SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 125 of 177340 papers

TitleStatusHype
Simple and Scalable Predictive Uncertainty Estimation using Deep EnsemblesVerified2
Energy-Based Transformers are Scalable Learners and ThinkersVerified5
Training independent subnetworks for robust predictionVerified1
Deep Ensembles: A Loss Landscape PerspectiveVerified1
Universal Reasoning ModelVerified1
OpenHands: An Open Platform for AI Software Developers as Generalist AgentsCode16
YOLOv9: Learning What You Want to Learn Using Programmable Gradient InformationCode16
MinerU: An Open-Source Solution for Precise Document Content ExtractionCode16
Docling Technical ReportCode16
DeepSeek-V3 Technical ReportCode16
AutoGen Studio: A No-Code Developer Tool for Building and Debugging Multi-Agent SystemsCode16
Mem0: Building Production-Ready AI Agents with Scalable Long-Term MemoryCode16
SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversionCode15
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement LearningCode15
LightRAG: Simple and Fast Retrieval-Augmented GenerationCode14
Optimizing Instructions and Demonstrations for Multi-Stage Language Model ProgramsCode14
Assisting in Writing Wikipedia-like Articles From Scratch with Large Language ModelsCode14
TradingAgents: Multi-Agents LLM Financial Trading FrameworkCode14
ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All ToolsCode14
From Local to Global: A Graph RAG Approach to Query-Focused SummarizationCode14
1-bit AI Infra: Part 1.1, Fast and Lossless BitNet b1.58 Inference on CPUsCode14
FLUX that Plays MusicCode14
Chatbot Arena: An Open Platform for Evaluating LLMs by Human PreferenceCode14
UI-TARS: Pioneering Automated GUI Interaction with Native AgentsCode14
Open-Sora 2.0: Training a Commercial-Level Video Generation Model in $200kCode14
Show:102550
← PrevPage 1 of 7094Next →