The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

659,983 papers248,104 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 401–425 of 659983 papers

Title	Date	Tasks	Status	Hype
The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree Search	Apr 10, 2025	scientific discovery	CodeCode Available	7
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models	Feb 29, 2024	Language ModellingMamba	CodeCode Available	7
Skywork-R1V3 Technical Report	Jul 8, 2025	cross-modal alignmentMathematical Reasoning	CodeCode Available	7
Interactive Prompt Debugging with Sequence Salience	Apr 11, 2024	Sentencetext-classification	CodeCode Available	7
gsplat: An Open-Source Library for Gaussian Splatting	Sep 10, 2024		CodeCode Available	7
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers	Oct 31, 2022	GPULanguage Modelling	CodeCode Available	7
EvoAgentX: An Automated Framework for Evolving Agentic Workflows	Jul 4, 2025	Code GenerationMath	CodeCode Available	7
DataComp-LM: In search of the next generation of training sets for language models	Jun 17, 2024	Language ModellingMMLU	CodeCode Available	7
VITA: Towards Open-Source Interactive Omni Multimodal LLM	Aug 9, 2024	Language ModelingLanguage Modelling	CodeCode Available	7
Segment Anything in Medical Images and Videos: Benchmark and Deployment	Aug 6, 2024	BenchmarkingSegmentation	CodeCode Available	7
LLM Reasoners: New Evaluation, Library, and Analysis of Step-by-Step Reasoning with Large Language Models	Apr 8, 2024		CodeCode Available	7
Cradle: Empowering Foundation Agents Towards General Computer Control	Mar 5, 2024	Efficient Exploration	CodeCode Available	7
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments	Apr 11, 2024	Benchmarking	CodeCode Available	7
Efficient Track Anything	Nov 28, 2024	ObjectSegmentation	CodeCode Available	7
Streamlining Ocean Dynamics Modeling with Fourier Neural Operators: A Multiobjective Hyperparameter and Architecture Optimization Approach	Apr 7, 2024	Efficient ExplorationHyperparameter Optimization	CodeCode Available	7
Embedding Atlas: Low-Friction, Interactive Embedding Visualization	May 9, 2025	Friction	CodeCode Available	7
A Library for Learning Neural Operators	Dec 13, 2024	Operator learning	CodeCode Available	7
Kimi k1.5: Scaling Reinforcement Learning with LLMs	Jan 22, 2025	Mathreinforcement-learning	CodeCode Available	7
AutoCodeRover: Autonomous Program Improvement	Apr 8, 2024	Bug fixingCode Search	CodeCode Available	7
S*: Test Time Scaling for Code Generation	Feb 20, 2025	Code GenerationMath	CodeCode Available	7
RT-DETRv2: Improved Baseline with Bag-of-Freebies for Real-Time Detection Transformer	Jul 24, 2024	Data AugmentationDecoder	CodeCode Available	7
AI-Researcher: Autonomous Scientific Innovation	May 24, 2025	scientific discovery	CodeCode Available	7
HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models	May 23, 2024	HippocampusKnowledge Graphs	CodeCode Available	7
PIXART-δ: Fast and Controllable Image Generation with Latent Consistency Models	Jan 10, 2024	GPUImage Generation	CodeCode Available	7
Large Language Model Agent: A Survey on Methodology, Applications and Challenges	Mar 27, 2025	Language ModelingLanguage Modelling	CodeCode Available	7