The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

659,983 papers248,104 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 826–850 of 177339 papers

Title	Date	Tasks	Status	Hype	Score
MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images	Mar 21, 2024	3D ReconstructionGeneralizable Novel View Synthesis	CodeCode Available	5	5
Lean Copilot: Large Language Models as Copilots for Theorem Proving in Lean	Apr 18, 2024	Automated Theorem ProvingHallucination	CodeCode Available	5	5
Balance Reward and Safety Optimization for Safe Reinforcement Learning: A Perspective of Gradient Manipulation	May 2, 2024	MuJoCoReinforcement Learning (RL)	CodeCode Available	5	5
Enhancing Efficiency of Safe Reinforcement Learning via Sample Manipulation	May 31, 2024	MuJoCoreinforcement-learning	CodeCode Available	5	5
The Vizier Gaussian Process Bandit Algorithm	Aug 21, 2024	Bayesian Optimization	CodeCode Available	5	5
Fundamental Components of Deep Learning: A category-theoretic approach	Mar 13, 2024	Deep LearningDescriptive	CodeCode Available	5	5
Magma: A Foundation Model for Multimodal AI Agents	Feb 18, 2025	Autonomous Web NavigationImage to text	CodeCode Available	5	5
LiveBench: A Challenging, Contamination-Limited LLM Benchmark	Jun 27, 2024	ArticlesInstruction Following	CodeCode Available	5	5
FuXi-2.0: Advancing machine learning weather forecasting model for practical applications	Sep 11, 2024	Weather Forecasting	CodeCode Available	5	5
Retinexformer: One-stage Retinex-based Transformer for Low-light Image Enhancement	Mar 12, 2023	Image EnhancementLow-light Image Deblurring and Enhancement	CodeCode Available	5	5
Neural Fields in Robotics: A Survey	Oct 26, 2024	3D ReconstructionAutonomous Driving	CodeCode Available	5	5
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs	Dec 25, 2024	Reinforcement Learning (RL)	CodeCode Available	5	5
SWE-Lancer: Can Frontier LLMs Earn $1 Million from Real-World Freelance Software Engineering?	Feb 17, 2025		CodeCode Available	5	5
TripoSG: High-Fidelity 3D Shape Synthesis using Large-Scale Rectified Flow Models	Feb 10, 2025	3D Generation3D Reconstruction	CodeCode Available	5	5
TikZero: Zero-Shot Text-Guided Graphics Program Synthesis	Mar 14, 2025	Program Synthesis	CodeCode Available	5	5
VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness	Mar 27, 2025	Anomaly DetectionVideo Generation	CodeCode Available	5	5
ZeroSearch: Incentivize the Search Capability of LLMs without Searching	May 7, 2025	Reinforcement Learning (RL)Retrieval	CodeCode Available	5	5
Show-o2: Improved Native Unified Multimodal Models	Jun 18, 2025	Language ModelingLanguage Modelling	CodeCode Available	5	5
REASONING GYM: Reasoning Environments for Reinforcement Learning with Verifiable Rewards	May 30, 2025	reinforcement-learningReinforcement Learning	CodeCode Available	5	5
DoWhy-GCM: An extension of DoWhy for causal inference in graphical causal models	Jun 14, 2022	Causal Inference	CodeCode Available	5	5
VADv2: End-to-End Vectorized Autonomous Driving via Probabilistic Planning	Feb 20, 2024	Autonomous DrivingNavSim	CodeCode Available	5	5
Rethinking LLM Language Adaptation: A Case Study on Chinese Mixtral	Mar 4, 2024	Language ModelingLanguage Modelling	CodeCode Available	5	5
Penzai + Treescope: A Toolkit for Interpreting, Visualizing, and Editing Models As Data	Aug 1, 2024		CodeCode Available	5	5
Uni-Mol Docking V2: Towards Realistic and Accurate Binding Pose Prediction	May 20, 2024	Drug DesignMolecular Docking	CodeCode Available	5	5
Showing Many Labels in Multi-label Classification Models: An Empirical Study of Adversarial Examples	Sep 26, 2024	Multi-Label ClassificationMUlTI-LABEL-ClASSIFICATION	CodeCode Available	5	5