SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

658,356 papers257,923 code links4,818 tasks

Papers

Showing 150 of 658356 papers

TitleStatusHype
YOLOv9: Learning What You Want to Learn Using Programmable Gradient InformationCode16
MinerU: An Open-Source Solution for Precise Document Content ExtractionCode16
SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversionCode15
YOLOv11: An Overview of the Key Architectural EnhancementsCode15
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement LearningCode15
DeepSeek-V3 Technical ReportCode15
Docling Technical ReportCode15
AutoGen Studio: A No-Code Developer Tool for Building and Debugging Multi-Agent SystemsCode15
OpenHands: An Open Platform for AI Software Developers as Generalist AgentsCode15
Mem0: Building Production-Ready AI Agents with Scalable Long-Term MemoryCode15
LightRAG: Simple and Fast Retrieval-Augmented GenerationCode14
Optimizing Instructions and Demonstrations for Multi-Stage Language Model ProgramsCode14
Assisting in Writing Wikipedia-like Articles From Scratch with Large Language ModelsCode14
TradingAgents: Multi-Agents LLM Financial Trading FrameworkCode14
Relevance Isn't All You Need: Scaling RAG Systems With Inference-Time Compute Via Multi-Criteria RerankingCode13
ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All ToolsCode13
UI-TARS: Pioneering Automated GUI Interaction with Native AgentsCode13
Qwen2 Technical ReportCode13
R&D-Agent-Quant: A Multi-Agent Framework for Data-Centric Factors and Model Joint OptimizationCode13
1-bit AI Infra: Part 1.1, Fast and Lossless BitNet b1.58 Inference on CPUsCode13
Open-Sora 2.0: Training a Commercial-Level Video Generation Model in $200kCode13
Open-Sora: Democratizing Efficient Video Production for AllCode13
Bitnet.cpp: Efficient Edge Inference for Ternary LLMsCode13
FLUX that Plays MusicCode13
Autonomous Agents for Collaborative Task under Information AsymmetryCode13
Qwen3 Technical ReportCode13
From Local to Global: A Graph RAG Approach to Query-Focused SummarizationCode13
Chatbot Arena: An Open Platform for Evaluating LLMs by Human PreferenceCode13
Into the Unknown Unknowns: Engaged Human Learning through Participation in Language Model Agent ConversationsCode13
Qwen2.5 Technical ReportCode13
MiniCPM-V: A GPT-4V Level MLLM on Your PhoneCode12
Zep: A Temporal Knowledge Graph Architecture for Agent MemoryCode12
OmniParser for Pure Vision Based GUI AgentCode12
DeepPlanning: Benchmarking Long-Horizon Agentic Planning with Verifiable Constraints11
Qwen3-Coder-Next Technical Report11
InstantID: Zero-shot Identity-Preserving Generation in SecondsCode11
KAN 2.0: Kolmogorov-Arnold Networks Meet ScienceCode11
Eagle and Finch: RWKV with Matrix-Valued States and Dynamic RecurrenceCode11
USP: A Unified Sequence Parallelism Approach for Long Context Generative AICode11
Mixtures of Experts Unlock Parameter Scaling for Deep RLCode11
CosyVoice 3: Towards In-the-wild Speech Generation via Scaling-up and Post-trainingCode11
BioMamba: Leveraging Spectro-Temporal Embedding in Bidirectional Mamba for Enhanced Biosignal ClassificationCode11
EASYTOOL: Enhancing LLM-based Agents with Concise Tool InstructionCode11
BurTorch: Revisiting Training from First Principles by Coupling Autodiff, Math Optimization, and SystemsCode11
Absolute Zero: Reinforced Self-play Reasoning with Zero DataCode11
Unified Modeling Language Code Generation from Diagram Images Using Multimodal Large Language ModelsCode11
HunyuanVideo: A Systematic Framework For Large Video Generative ModelsCode11
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any ResolutionCode11
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow MatchingCode11
On the Design and Analysis of LLM-Based AlgorithmsCode11
Show:102550
← PrevPage 1 of 13168Next →