The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

659,983 papers248,104 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 776–800 of 177339 papers

Title	Date	Tasks	Status	Hype	Score
Secrets of RLHF in Large Language Models Part II: Reward Modeling	Jan 11, 2024	Contrastive LearningMeta-Learning	CodeCode Available	5	5
BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed Dual-Branch Diffusion	Mar 11, 2024	Image Inpainting	CodeCode Available	5	5
Uncertainty Quantification for Language Models: A Suite of Black-Box, White-Box, LLM Judge, and Ensemble Scorers	Apr 27, 2025	HallucinationQuestion Answering	CodeCode Available	5	5
WeNet 2.0: More Productive End-to-End Speech Recognition Toolkit	Mar 29, 2022	DecoderLanguage Modelling	CodeCode Available	5	5
WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models	Jan 25, 2024		CodeCode Available	5	5
MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering	Oct 9, 2024		CodeCode Available	5	5
Free Process Rewards without Process Labels	Dec 2, 2024	Math	CodeCode Available	5	5
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs	Jun 11, 2024	Multiple-choiceQuestion Answering	CodeCode Available	5	5
Executable Code Actions Elicit Better LLM Agents	Feb 1, 2024	Language ModellingLarge Language Model	CodeCode Available	5	5
InspireMusic: Integrating Super Resolution and Large Language Model for High-Fidelity Long-Form Music Generation	Feb 28, 2025	Audio GenerationForm	CodeCode Available	5	5
PatchRefiner: Leveraging Synthetic Data for Real-Domain High-Resolution Monocular Metric Depth Estimation	Jun 10, 2024	3D ReconstructionAutonomous Driving	CodeCode Available	5	5
ZoeDepth: Zero-shot Transfer by Combining Relative and Metric Depth	Feb 23, 2023	Depth EstimationMonocular Depth Estimation	CodeCode Available	5	5
Continuous Thought Machines	May 8, 2025	Computational EfficiencyQuestion Answering	CodeCode Available	5	5
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models	Jan 29, 2024	DecoderMixture-of-Experts	CodeCode Available	5	5
MiMo: Unlocking the Reasoning Potential of Language Model -- From Pretraining to Posttraining	May 12, 2025	Language ModelingLanguage Modelling	CodeCode Available	5	5
Efficient Streaming Language Models with Attention Sinks	Sep 29, 2023	Language ModelingLanguage Modelling	CodeCode Available	5	5
OpenScholar: Synthesizing Scientific Literature with Retrieval-augmented LMs	Nov 21, 2024	Retrieval	CodeCode Available	5	5
Group-in-Group Policy Optimization for LLM Agent Training	May 16, 2025	GPUMathematical Reasoning	CodeCode Available	5	5
Sequencer: Deep LSTM for Image Classification	May 4, 2022	Domain Generalizationimage-classification	CodeCode Available	5	5
FunReason: Enhancing Large Language Models' Function Calling via Self-Refinement Multiscale Loss and Automated Data Refinement	May 26, 2025		CodeCode Available	5	5
Darwin Godel Machine: Open-Ended Evolution of Self-Improving Agents	May 29, 2025	Meta-Learning	CodeCode Available	5	5
EvoGit: Decentralized Code Evolution via Git-Based Multi-Agent Collaboration	Jun 1, 2025		CodeCode Available	5	5
Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models	Jun 5, 2025	RerankingRetrieval	CodeCode Available	5	5
SoundMind: RL-Incentivized Logic Reasoning for Audio-Language Models	Jun 15, 2025	Logical ReasoningReinforcement Learning (RL)	CodeCode Available	5	5
Matrix-Game: Interactive World Foundation Model	Jun 23, 2025	Minecraftmodel	CodeCode Available	5	5