The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

659,983 papers248,104 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 351–375 of 177339 papers

Title	Date	Tasks	Status	Hype	Score
OpenThoughts: Data Recipes for Reasoning Models	Jun 4, 2025	Math	CodeCode Available	7	5
Training AI to be Loyal	Jan 27, 2025		CodeCode Available	7	5
CyberSecEval 2: A Wide-Ranging Cybersecurity Evaluation Suite for Large Language Models	Apr 19, 2024		CodeCode Available	7	5
Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers	May 27, 2025		CodeCode Available	7	5
MoBA: Mixture of Block Attention for Long-Context LLMs	Feb 18, 2025	Mixture-of-Experts	CodeCode Available	7	5
O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple Distillation, Big Progress or Bitter Lesson?	Nov 25, 2024	HallucinationKnowledge Distillation	CodeCode Available	7	5
D-FINE: Redefine Regression Task in DETRs as Fine-grained Distribution Refinement	Oct 17, 2024	GPUReal-Time Object Detection	CodeCode Available	7	5
pySLAM: An Open-Source, Modular, and Extensible Framework for SLAM	Feb 17, 2025	Depth EstimationDepth Prediction	CodeCode Available	7	5
Exploring Compressed Image Representation as a Perceptual Proxy: A Study	Jan 14, 2024	Image CompressionPerceptual Distance	CodeCode Available	7	5
Practical Efficiency of Muon for Pretraining	May 4, 2025		CodeCode Available	7	5
Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models	May 6, 2023	Math	CodeCode Available	7	5
Low-code LLM: Graphical User Interface over Large Language Models	Apr 17, 2023	Prompt Engineering	CodeCode Available	7	5
O1 Replication Journey: A Strategic Progress Report -- Part 1	Oct 8, 2024	Mathscientific discovery	CodeCode Available	7	5
Large Concept Models: Language Modeling in a Sentence Representation Space	Dec 11, 2024	Language ModelingLanguage Modelling	CodeCode Available	7	5
HuixiangDou: Overcoming Group Chat Scenarios with LLM-based Technical Assistance	Jan 16, 2024	In-Context Learning	CodeCode Available	7	5
3DGUT: Enabling Distorted Cameras and Secondary Rays in Gaussian Splatting	Dec 17, 2024	3DGSNovel View Synthesis	CodeCode Available	7	5
Scalable MatMul-free Language Modeling	Jun 4, 2024	GPULanguage Modeling	CodeCode Available	7	5
Mooncake: A KVCache-centric Disaggregated Architecture for LLM Serving	Jun 24, 2024	CPUGPU	CodeCode Available	7	5
Seed-TTS: A Family of High-Quality Versatile Speech Generation Models	Jun 4, 2024	In-Context LearningLanguage Modelling	CodeCode Available	7	5
MMSU: A Massive Multi-task Spoken Language Understanding and Reasoning Benchmark	Jun 5, 2025	RhythmSpoken Language Understanding	CodeCode Available	7	5
EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty	Jan 26, 2024	Code GenerationInstruction Following	CodeCode Available	7	5
The Prompt Report: A Systematic Survey of Prompting Techniques	Jun 6, 2024	Prompt EngineeringSurvey	CodeCode Available	7	5
Qwen2.5-Omni Technical Report	Mar 26, 2025	Automatic Speech Recognition (ASR)GSM8K	CodeCode Available	7	5
Disaggregated Multi-Tower: Topology-aware Modeling Technique for Efficient Large-Scale Recommendation	Mar 1, 2024		CodeCode Available	7	5
Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems	Mar 31, 2025	AutoMLContinual Learning	CodeCode Available	7	5