The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–50 of 474278 papers

Title	Date	Tasks	Status	Hype
Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory	Apr 28, 2025	RAGRetrieval-augmented Generation	CodeCode Available	16
DeepSeek-V3 Technical Report	Dec 27, 2024	GPULanguage Modeling	CodeCode Available	16
MinerU: An Open-Source Solution for Precise Document Content Extraction	Sep 27, 2024	DiversityOptical Character Recognition (OCR)	CodeCode Available	16
Docling Technical Report	Aug 19, 2024		CodeCode Available	16
AutoGen Studio: A No-Code Developer Tool for Building and Debugging Multi-Agent Systems	Aug 9, 2024		CodeCode Available	16
OpenHands: An Open Platform for AI Software Developers as Generalist Agents	Jul 23, 2024		CodeCode Available	16
YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information	Feb 21, 2024	object-detectionObject Detection	CodeCode Available	16
SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion	Mar 14, 2025	Language ModelingLanguage Modelling	CodeCode Available	15
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning	Jan 22, 2025	Mathematical ReasoningMulti-task Language Understanding	CodeCode Available	15
Qwen3 Technical Report	May 14, 2025	Code GenerationMathematical Reasoning	CodeCode Available	14
Relevance Isn't All You Need: Scaling RAG Systems With Inference-Time Compute Via Multi-Criteria Reranking	Mar 14, 2025	AllLarge Language Model	CodeCode Available	14
Open-Sora 2.0: Training a Commercial-Level Video Generation Model in $200k	Mar 12, 2025	Video Generation	CodeCode Available	14
UI-TARS: Pioneering Automated GUI Interaction with Native Agents	Jan 21, 2025		CodeCode Available	14
TradingAgents: Multi-Agents LLM Financial Trading Framework	Dec 28, 2024	Management	CodeCode Available	14
1-bit AI Infra: Part 1.1, Fast and Lossless BitNet b1.58 Inference on CPUs	Oct 21, 2024		CodeCode Available	14
LightRAG: Simple and Fast Retrieval-Augmented Generation	Oct 8, 2024	Information RetrievalRAG	CodeCode Available	14
FLUX that Plays Music	Sep 1, 2024	Music GenerationText-to-Music Generation	CodeCode Available	14
Autonomous Agents for Collaborative Task under Information Asymmetry	Jun 21, 2024	Language ModellingLarge Language Model	CodeCode Available	14
ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools	Jun 18, 2024	AllGSM8K	CodeCode Available	14
Optimizing Instructions and Demonstrations for Multi-Stage Language Model Programs	Jun 17, 2024	Language ModelingLanguage Modelling	CodeCode Available	14
From Local to Global: A Graph RAG Approach to Query-Focused Summarization	Apr 24, 2024	Query-focused SummarizationQuestion Answering	CodeCode Available	14
Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference	Mar 7, 2024	Chatbot	CodeCode Available	14
Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models	Feb 22, 2024	ArticlesRetrieval	CodeCode Available	14
R&D-Agent-Quant: A Multi-Agent Framework for Data-Centric Factors and Model Joint Optimization	May 21, 2025	Code GenerationModel Optimization	CodeCode Available	13
Bitnet.cpp: Efficient Edge Inference for Ternary LLMs	Feb 17, 2025		CodeCode Available	13
Open-Sora: Democratizing Efficient Video Production for All	Dec 29, 2024	AllImage Generation	CodeCode Available	13
Qwen2.5 Technical Report	Dec 19, 2024	Common Sense Reasoning	CodeCode Available	13
Qwen2 Technical Report	Jul 15, 2024	Arithmetic ReasoningGSM8K	CodeCode Available	13
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics	Jun 2, 2025	Action GenerationGPU	CodeCode Available	12
Zep: A Temporal Knowledge Graph Architecture for Agent Memory	Jan 20, 2025	Large Language ModelRAG	CodeCode Available	12
MiniCPM-V: A GPT-4V Level MLLM on Your Phone	Aug 3, 2024	HallucinationMultiple-choice	CodeCode Available	12
OmniParser for Pure Vision Based GUI Agent	Aug 1, 2024	Natural Language Visual Grounding	CodeCode Available	12
SAM 2: Segment Anything in Images and Videos	Aug 1, 2024	Image SegmentationRobot Manipulation Generalization	CodeCode Available	12
FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision	Jul 11, 2024	GPUQuantization	CodeCode Available	12
Qwen3-Coder-Next Technical Report	Feb 28, 2026		—Unverified	11
DeepPlanning: Benchmarking Long-Horizon Agentic Planning with Verifiable Constraints	Jan 26, 2026		—Unverified	11
WebSailor: Navigating Super-human Reasoning for Web Agent	Jul 3, 2025		CodeCode Available	11
OWL: Optimized Workforce Learning for General Multi-Agent Assistance in Real-World Task Automation	May 29, 2025	Large Language Model	CodeCode Available	11
WebDancer: Towards Autonomous Information Seeking Agency	May 28, 2025		CodeCode Available	11
CosyVoice 3: Towards In-the-wild Speech Generation via Scaling-up and Post-training	May 23, 2025	Automatic Speech RecognitionEmotion Recognition	CodeCode Available	11
Absolute Zero: Reinforced Self-play Reasoning with Zero Data	May 6, 2025	Mathematical Reasoning	CodeCode Available	11
Packing Input Frame Context in Next-Frame Prediction Models for Video Generation	Apr 17, 2025		CodeCode Available	11
Agent S2: A Compositional Generalist-Specialist Framework for Computer Use Agents	Apr 1, 2025	AI AgentTask Planning	CodeCode Available	11
Wan: Open and Advanced Large-Scale Video Generative Models	Mar 26, 2025	Video EditingVideo Generation	CodeCode Available	11
VGGT: Visual Geometry Grounded Transformer	Mar 14, 2025	Depth EstimationNovel View Synthesis	CodeCode Available	11
YOLOE: Real-Time Seeing Anything	Mar 10, 2025	10-shot image generation	CodeCode Available	11
Attentive Reasoning Queries: A Systematic Method for Optimizing Instruction-Following in Large Language Models	Mar 5, 2025	HallucinationInstruction Following	CodeCode Available	11
Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens	Mar 3, 2025	Attributetext-to-speech	CodeCode Available	11
SCORE: Systematic COnsistency and Robustness Evaluation for Large Language Models	Feb 28, 2025	MMLU	CodeCode Available	11
olmOCR: Unlocking Trillions of Tokens in PDFs with Vision Language Models	Feb 25, 2025	DiversityLanguage Modeling	CodeCode Available	11