The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

658,356 papers258,216 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 251–300 of 658356 papers

Title	Date	Tasks	Status	Hype
MMSU: A Massive Multi-task Spoken Language Understanding and Reasoning Benchmark	Jun 5, 2025	RhythmSpoken Language Understanding	CodeCode Available	7
Pre^3: Enabling Deterministic Pushdown Automata for Faster Structured LLM Generation	Jun 4, 2025		CodeCode Available	7
OpenThoughts: Data Recipes for Reasoning Models	Jun 4, 2025	Math	CodeCode Available	7
AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning	May 30, 2025	GPUMath	CodeCode Available	7
Let Them Talk: Audio-Driven Multi-Person Conversational Video Generation	May 28, 2025	Human AnimationInstruction Following	CodeCode Available	7
HiDream-I1: A High-Efficient Image Generative Foundation Model with Sparse Diffusion Transformer	May 28, 2025	Image GenerationMixture-of-Experts	CodeCode Available	7
Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers	May 27, 2025		CodeCode Available	7
SageAttention2++: A More Efficient Implementation of SageAttention2	May 27, 2025	QuantizationVideo Generation	CodeCode Available	7
HunyuanVideo-Avatar: High-Fidelity Audio-Driven Human Animation for Multiple Characters	May 26, 2025	Human Animation	CodeCode Available	7
SEW: Self-Evolving Agentic Workflows for Automated Code Generation	May 24, 2025	Code Generation	CodeCode Available	7
AI-Researcher: Autonomous Scientific Innovation	May 24, 2025	scientific discovery	CodeCode Available	7
Speechless: Speech Instruction Training Without Speech for Low Resource Languages	May 23, 2025	speech-recognitionSpeech Recognition	CodeCode Available	7
ViDoRe Benchmark V2: Raising the Bar for Visual Retrieval	May 22, 2025	Retrieval	CodeCode Available	7
An Empirical Study on Reinforcement Learning for Reasoning-Search Interleaved LLM Agents	May 21, 2025	Reinforcement Learning (RL)	CodeCode Available	7
Visual Agentic Reinforcement Fine-Tuning	May 20, 2025	Image Manipulation	CodeCode Available	7
Faster Video Diffusion with Trainable Sparse Attention	May 19, 2025		CodeCode Available	7
MAGI-1: Autoregressive Video Generation at Scale	May 19, 2025	Video Generation	CodeCode Available	7
Logo-LLM: Local and Global Modeling with Large Language Models for Time Series Forecasting	May 16, 2025	Time SeriesTime Series Forecasting	CodeCode Available	7
SageAttention3: Microscaling FP4 Attention for Inference and An Exploration of 8-Bit Training	May 16, 2025		CodeCode Available	7
Marigold: Affordable Adaptation of Diffusion-Based Image Generators for Image Analysis	May 14, 2025	DenoisingDepth Estimation	CodeCode Available	7
Fast Text-to-Audio Generation with Adversarial Post-Training	May 13, 2025	ARCAudio Generation	CodeCode Available	7
HealthBench: Evaluating Large Language Models Towards Improved Human Health	May 13, 2025	Instruction FollowingMultiple-choice	CodeCode Available	7
Embedding Atlas: Low-Friction, Interactive Embedding Visualization	May 9, 2025	Friction	CodeCode Available	7
Flow-GRPO: Training Flow Matching Models via Online RL	May 8, 2025	DenoisingDiversity	CodeCode Available	7
Practical Efficiency of Muon for Pretraining	May 4, 2025		CodeCode Available	7
Kimi-Audio Technical Report	Apr 25, 2025	Audio Question AnsweringQuestion Answering	CodeCode Available	7
RAGEN: Understanding Self-Evolution in LLM Agents via Multi-Turn Reinforcement Learning	Apr 24, 2025	Decision MakingReinforcement Learning (RL)	CodeCode Available	7
Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning	Apr 24, 2025	Code Generation	CodeCode Available	7
Step1X-Edit: A Practical Framework for General Image Editing	Apr 24, 2025	Image Editing	CodeCode Available	7
Skywork R1V2: Multimodal Hybrid Reinforcement Learning for Reasoning	Apr 23, 2025	Multimodal Reasoningreinforcement-learning	CodeCode Available	7
TTRL: Test-Time Reinforcement Learning	Apr 22, 2025	Mathreinforcement-learning	CodeCode Available	7
PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding	Apr 17, 2025	Video Question AnsweringVideo Understanding	CodeCode Available	7
Chinese-Vicuna: A Chinese Instruction-following Llama-based Model	Apr 17, 2025	Code GenerationCPU	CodeCode Available	7
BrowseComp: A Simple Yet Challenging Benchmark for Browsing Agents	Apr 16, 2025		CodeCode Available	7
Aligning Anime Video Generation with Human Feedback	Apr 14, 2025	Video Generation	CodeCode Available	7
The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree Search	Apr 10, 2025	scientific discovery	CodeCode Available	7
A Scalable Approach to Clustering Embedding Projections	Apr 9, 2025	ClusteringDensity Estimation	CodeCode Available	7
Skywork R1V: Pioneering Multimodal Reasoning with Chain-of-Thought	Apr 8, 2025	Language ModelingLanguage Modelling	CodeCode Available	7
Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems	Mar 31, 2025	AutoMLContinual Learning	CodeCode Available	7
Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model	Mar 31, 2025		CodeCode Available	7
Large Language Model Agent: A Survey on Methodology, Applications and Challenges	Mar 27, 2025	Language ModelingLanguage Modelling	CodeCode Available	7
Open Deep Search: Democratizing Search with Open-source Reasoning Agents	Mar 26, 2025	10-shot image generation	CodeCode Available	7
Bridging Evolutionary Multiobjective Optimization and GPU Acceleration via Tensorization	Mar 26, 2025	CPUGPU	CodeCode Available	7
Qwen2.5-Omni Technical Report	Mar 26, 2025	Automatic Speech Recognition (ASR)GSM8K	CodeCode Available	7
Scaling Vision Pre-Training to 4K Resolution	Mar 25, 2025	4kContrastive Learning	CodeCode Available	7
SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild	Mar 24, 2025	Instruction FollowingMath	CodeCode Available	7
Enhancing Fourier Neural Operators with Local Spatial Features	Mar 22, 2025	Computational Efficiency	CodeCode Available	7
InfiniteYou: Flexible Photo Recrafting While Preserving Your Identity	Mar 20, 2025	Image Generation	CodeCode Available	7
xLSTM 7B: A Recurrent LLM for Fast and Efficient Inference	Mar 17, 2025	MambaMath	CodeCode Available	7
LHM: Large Animatable Human Reconstruction Model from a Single Image in Seconds	Mar 13, 2025	3D Human Reconstruction	CodeCode Available	7