SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 150 of 474278 papers

TitleStatusHype
Mem0: Building Production-Ready AI Agents with Scalable Long-Term MemoryCode16
DeepSeek-V3 Technical ReportCode16
MinerU: An Open-Source Solution for Precise Document Content ExtractionCode16
Docling Technical ReportCode16
AutoGen Studio: A No-Code Developer Tool for Building and Debugging Multi-Agent SystemsCode16
OpenHands: An Open Platform for AI Software Developers as Generalist AgentsCode16
YOLOv9: Learning What You Want to Learn Using Programmable Gradient InformationCode16
SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversionCode15
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement LearningCode15
Qwen3 Technical ReportCode14
Relevance Isn't All You Need: Scaling RAG Systems With Inference-Time Compute Via Multi-Criteria RerankingCode14
Open-Sora 2.0: Training a Commercial-Level Video Generation Model in $200kCode14
UI-TARS: Pioneering Automated GUI Interaction with Native AgentsCode14
TradingAgents: Multi-Agents LLM Financial Trading FrameworkCode14
1-bit AI Infra: Part 1.1, Fast and Lossless BitNet b1.58 Inference on CPUsCode14
LightRAG: Simple and Fast Retrieval-Augmented GenerationCode14
FLUX that Plays MusicCode14
Autonomous Agents for Collaborative Task under Information AsymmetryCode14
ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All ToolsCode14
Optimizing Instructions and Demonstrations for Multi-Stage Language Model ProgramsCode14
From Local to Global: A Graph RAG Approach to Query-Focused SummarizationCode14
Chatbot Arena: An Open Platform for Evaluating LLMs by Human PreferenceCode14
Assisting in Writing Wikipedia-like Articles From Scratch with Large Language ModelsCode14
R&D-Agent-Quant: A Multi-Agent Framework for Data-Centric Factors and Model Joint OptimizationCode13
Bitnet.cpp: Efficient Edge Inference for Ternary LLMsCode13
Open-Sora: Democratizing Efficient Video Production for AllCode13
Qwen2.5 Technical ReportCode13
Qwen2 Technical ReportCode13
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient RoboticsCode12
Zep: A Temporal Knowledge Graph Architecture for Agent MemoryCode12
MiniCPM-V: A GPT-4V Level MLLM on Your PhoneCode12
OmniParser for Pure Vision Based GUI AgentCode12
SAM 2: Segment Anything in Images and VideosCode12
FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precisionCode12
Qwen3-Coder-Next Technical Report11
DeepPlanning: Benchmarking Long-Horizon Agentic Planning with Verifiable Constraints11
WebSailor: Navigating Super-human Reasoning for Web AgentCode11
OWL: Optimized Workforce Learning for General Multi-Agent Assistance in Real-World Task AutomationCode11
WebDancer: Towards Autonomous Information Seeking AgencyCode11
CosyVoice 3: Towards In-the-wild Speech Generation via Scaling-up and Post-trainingCode11
Absolute Zero: Reinforced Self-play Reasoning with Zero DataCode11
Packing Input Frame Context in Next-Frame Prediction Models for Video GenerationCode11
Agent S2: A Compositional Generalist-Specialist Framework for Computer Use AgentsCode11
Wan: Open and Advanced Large-Scale Video Generative ModelsCode11
VGGT: Visual Geometry Grounded TransformerCode11
YOLOE: Real-Time Seeing AnythingCode11
Attentive Reasoning Queries: A Systematic Method for Optimizing Instruction-Following in Large Language ModelsCode11
Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech TokensCode11
SCORE: Systematic COnsistency and Robustness Evaluation for Large Language ModelsCode11
olmOCR: Unlocking Trillions of Tokens in PDFs with Vision Language ModelsCode11
Show:102550
← PrevPage 1 of 9486Next →