The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2126–2150 of 177340 papers

Title	Date	Tasks	Status	Hype	Score
CRUXEval: A Benchmark for Code Reasoning, Understanding and Execution	Jan 5, 2024	HumanEvalPrediction	CodeCode Available	4	5
VideoEval-Pro: Robust and Realistic Long Video Understanding Evaluation	May 20, 2025	MMEMultiple-choice	CodeCode Available	4	5
CitationMap: A Python Tool to Identify and Visualize Your Google Scholar Citations Around the World	Aug 2, 2024	Citation VisualizationData Visualization	CodeCode Available	4	5
Real-time volumetric rendering of dynamic humans	Mar 21, 2023	3D ReconstructionGPU	CodeCode Available	4	5
Improving Parallel Program Performance with LLM Optimizers via Agent-System Interfaces	Oct 21, 2024	Code Generationscientific discovery	CodeCode Available	4	5
DeepFakes and Beyond: A Survey of Face Manipulation and Fake Detection	Jan 1, 2020	AttributeDeepFake Detection	CodeCode Available	4	5
Inductive Moment Matching	Mar 10, 2025		CodeCode Available	4	5
Polysemous codes	Sep 7, 2016	Quantization	CodeCode Available	4	5
SWE-bench: Can Language Models Resolve Real-World GitHub Issues?	Oct 10, 2023	Bug fixingCode Generation	CodeCode Available	4	5
RUMI: Rummaging Using Mutual Information	Aug 19, 2024	Model Predictive ControlObject	CodeCode Available	4	5
ChatGPT Outperforms Crowd-Workers for Text-Annotation Tasks	Mar 27, 2023	text annotationText Classification	CodeCode Available	4	5
A General Theoretical Paradigm to Understand Learning from Human Preferences	Oct 18, 2023		CodeCode Available	4	5
Self-Supervised Geometry-Guided Initialization for Robust Monocular Visual Odometry	Jun 3, 2024	Depth EstimationMonocular Depth Estimation	CodeCode Available	4	5
MUSE: Machine Unlearning Six-Way Evaluation for Language Models	Jul 8, 2024	ArticlesMachine Unlearning	CodeCode Available	4	5
Stock Price Prediction via Discovering Multi-Frequency Trading Patterns	Aug 13, 2017	PredictionStock Price Prediction	CodeCode Available	4	5
The Model Openness Framework: Promoting Completeness and Openness for Reproducibility, Transparency, and Usability in Artificial Intelligence	Mar 20, 2024		CodeCode Available	4	5
Fast Transformer Decoding: One Write-Head is All You Need	Nov 6, 2019	AllLanguage Modelling	CodeCode Available	4	5
OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction Data	Oct 2, 2024	Arithmetic ReasoningLarge Language Model	CodeCode Available	4	5
DisCo-DSO: Coupling Discrete and Continuous Optimization for Efficient Generative Design in Hybrid Spaces	Dec 15, 2024	Symbolic Regression	CodeCode Available	4	5
Ideas in Inference-time Scaling can Benefit Generative Pre-training Algorithms	Mar 10, 2025		CodeCode Available	4	5
Tiny-PULP-Dronets: Squeezing Neural Networks for Faster and Lighter Inference on Multi-Tasking Autonomous Nano-Drones	Jul 2, 2024	Autonomous Navigation	CodeCode Available	4	5
ReARTeR: Retrieval-Augmented Reasoning with Trustworthy Process Rewarding	Jan 14, 2025	RAGRetrieval	CodeCode Available	4	5
PointVLA: Injecting the 3D World into Vision-Language-Action Models	Mar 10, 2025	Imitation LearningSpatial Reasoning	CodeCode Available	4	5
ViViD: Video Virtual Try-on using Diffusion Models	May 20, 2024	Virtual Try-on	CodeCode Available	4	5
GeoWizard: Unleashing the Diffusion Priors for 3D Geometry Estimation from a Single Image	Mar 18, 2024	3D geometry3D Reconstruction	CodeCode Available	4	5