The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 26–50 of 177340 papers

Title	Date	Tasks	Status	Hype	Score
Relevance Isn't All You Need: Scaling RAG Systems With Inference-Time Compute Via Multi-Criteria Reranking	Mar 14, 2025	AllLarge Language Model	CodeCode Available	14	5
Autonomous Agents for Collaborative Task under Information Asymmetry	Jun 21, 2024	Language ModellingLarge Language Model	CodeCode Available	14	5
Qwen3 Technical Report	May 14, 2025	Code GenerationMathematical Reasoning	CodeCode Available	14	5
Qwen2.5 Technical Report	Dec 19, 2024	Common Sense Reasoning	CodeCode Available	13	5
Qwen2 Technical Report	Jul 15, 2024	Arithmetic ReasoningGSM8K	CodeCode Available	13	5
R&D-Agent-Quant: A Multi-Agent Framework for Data-Centric Factors and Model Joint Optimization	May 21, 2025	Code GenerationModel Optimization	CodeCode Available	13	5
Open-Sora: Democratizing Efficient Video Production for All	Dec 29, 2024	AllImage Generation	CodeCode Available	13	5
Bitnet.cpp: Efficient Edge Inference for Ternary LLMs	Feb 17, 2025		CodeCode Available	13	5
MiniCPM-V: A GPT-4V Level MLLM on Your Phone	Aug 3, 2024	HallucinationMultiple-choice	CodeCode Available	12	5
Zep: A Temporal Knowledge Graph Architecture for Agent Memory	Jan 20, 2025	Large Language ModelRAG	CodeCode Available	12	5
OmniParser for Pure Vision Based GUI Agent	Aug 1, 2024	Natural Language Visual Grounding	CodeCode Available	12	5
SAM 2: Segment Anything in Images and Videos	Aug 1, 2024	Image SegmentationRobot Manipulation Generalization	CodeCode Available	12	5
FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision	Jul 11, 2024	GPUQuantization	CodeCode Available	12	5
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics	Jun 2, 2025	Action GenerationGPU	CodeCode Available	12	5
DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence	Jan 25, 2024	Code GenerationLanguage Modeling	CodeCode Available	11	5
Qwen2.5-Coder Technical Report	Sep 18, 2024	Code Generation	CodeCode Available	11	5
EAP4EMSIG -- Experiment Automation Pipeline for Event-Driven Microscopy to Smart Microfluidic Single-Cells Analysis	Nov 6, 2024		CodeCode Available	11	5
AgentScope: A Flexible yet Robust Multi-Agent Platform	Feb 21, 2024	Multi-agent Integration	CodeCode Available	11	5
NYU CTF Bench: A Scalable Open-Source Benchmark Dataset for Evaluating LLMs in Offensive Security	Jun 8, 2024	Task PlanningVulnerability Detection	CodeCode Available	11	5
WebWalker: Benchmarking LLMs in Web Traversal	Jan 13, 2025	BenchmarkingOpen-Domain Question Answering	CodeCode Available	11	5
Gymnasium: A Standard Interface for Reinforcement Learning Environments	Jul 24, 2024	reinforcement-learningReinforcement Learning	CodeCode Available	11	5
KAN: Kolmogorov-Arnold Networks	Apr 30, 2024	Kolmogorov-Arnold Networks	CodeCode Available	11	5
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching	Oct 9, 2024	Denoisingtext-to-speech	CodeCode Available	11	5
HunyuanVideo: A Systematic Framework For Large Video Generative Models	Dec 3, 2024	Video AlignmentVideo Generation	CodeCode Available	11	5
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution	Sep 18, 2024	Natural Language Visual Grounding	CodeCode Available	11	5