The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

659,983 papers248,104 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 751–800 of 177339 papers

Title	Date	Tasks	Status	Hype	Score
Agent-E: From Autonomous Web Navigation to Foundational Design Principles in Agentic Systems	Jul 17, 2024	Autonomous Web NavigationDenoising	CodeCode Available	5	5
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale	Aug 15, 2022	GPULanguage Modelling	CodeCode Available	5	5
SpeechGPT-Gen: Scaling Chain-of-Information Speech Generation	Jan 24, 2024	text-to-speechText to Speech	CodeCode Available	5	5
MMBench: Is Your Multi-modal Model an All-around Player?	Jul 12, 2023	AllInstruction Following	CodeCode Available	5	5
TAPVid-3D: A Benchmark for Tracking Any Point in 3D	Jul 8, 2024	Point Tracking	CodeCode Available	5	5
Retrieval-Augmented Generation for AI-Generated Content: A Survey	Feb 29, 2024	Information RetrievalLarge Language Model	CodeCode Available	5	5
Codec-SUPERB @ SLT 2024: A lightweight benchmark for neural audio codec models	Sep 21, 2024	Language ModelingLanguage Modelling	CodeCode Available	5	5
Improved Distribution Matching Distillation for Fast Image Synthesis	May 23, 2024	Image Generation	CodeCode Available	5	5
Large Language Model based Multi-Agents: A Survey of Progress and Challenges	Jan 21, 2024	Decision MakingLanguage Modeling	CodeCode Available	5	5
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation	Jun 10, 2024	Conditional Image GenerationImage Generation	CodeCode Available	5	5
Mora: Enabling Generalist Video Generation via A Multi-Agent Framework	Mar 20, 2024	Image to Video GenerationText-to-Video Generation	CodeCode Available	5	5
HealthGPT: A Medical Large Vision-Language Model for Unifying Comprehension and Generation via Heterogeneous Knowledge Adaptation	Feb 14, 2025	Language ModelingLanguage Modelling	CodeCode Available	5	5
The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models	Jun 9, 2024	Instruction Following	CodeCode Available	5	5
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks	Jun 12, 2024	Image GenerationLanguage Modeling	CodeCode Available	5	5
Diffusion for World Modeling: Visual Details Matter in Atari	May 20, 2024	Image Generationreinforcement-learning	CodeCode Available	5	5
Flashlight: Enabling Innovation in Tools for Machine Learning	Jan 29, 2022	BIG-bench Machine Learning	CodeCode Available	5	5
Astraios: Parameter-Efficient Instruction Tuning Code Large Language Models	Jan 1, 2024	Code Generationparameter-efficient fine-tuning	CodeCode Available	5	5
Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think	Oct 9, 2024	DenoisingImage Generation	CodeCode Available	5	5
BootsTAP: Bootstrapped Training for Tracking-Any-Point	Feb 1, 2024	Point Tracking	CodeCode Available	5	5
BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset	May 14, 2025	Image Generation	CodeCode Available	5	5
An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion	Aug 2, 2022	Image GenerationPersonalized Image Generation	CodeCode Available	5	5
ChronoMagic-Bench: A Benchmark for Metamorphic Evaluation of Text-to-Time-lapse Video Generation	Jun 26, 2024	Text-to-Video GenerationVideo Generation	CodeCode Available	5	5
OffsetBias: Leveraging Debiased Data for Tuning Evaluators	Jul 9, 2024		CodeCode Available	5	5
Meta-World+: An Improved, Standardized, RL Benchmark	May 16, 2025	Meta Reinforcement Learningreinforcement-learning	CodeCode Available	5	5
MONAI: An open-source framework for deep learning in healthcare	Nov 4, 2022	Deep LearningMedical Image Classification	CodeCode Available	5	5
Secrets of RLHF in Large Language Models Part II: Reward Modeling	Jan 11, 2024	Contrastive LearningMeta-Learning	CodeCode Available	5	5
BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed Dual-Branch Diffusion	Mar 11, 2024	Image Inpainting	CodeCode Available	5	5
Uncertainty Quantification for Language Models: A Suite of Black-Box, White-Box, LLM Judge, and Ensemble Scorers	Apr 27, 2025	HallucinationQuestion Answering	CodeCode Available	5	5
WeNet 2.0: More Productive End-to-End Speech Recognition Toolkit	Mar 29, 2022	DecoderLanguage Modelling	CodeCode Available	5	5
WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models	Jan 25, 2024		CodeCode Available	5	5
MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering	Oct 9, 2024		CodeCode Available	5	5
Free Process Rewards without Process Labels	Dec 2, 2024	Math	CodeCode Available	5	5
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs	Jun 11, 2024	Multiple-choiceQuestion Answering	CodeCode Available	5	5
Executable Code Actions Elicit Better LLM Agents	Feb 1, 2024	Language ModellingLarge Language Model	CodeCode Available	5	5
InspireMusic: Integrating Super Resolution and Large Language Model for High-Fidelity Long-Form Music Generation	Feb 28, 2025	Audio GenerationForm	CodeCode Available	5	5
PatchRefiner: Leveraging Synthetic Data for Real-Domain High-Resolution Monocular Metric Depth Estimation	Jun 10, 2024	3D ReconstructionAutonomous Driving	CodeCode Available	5	5
ZoeDepth: Zero-shot Transfer by Combining Relative and Metric Depth	Feb 23, 2023	Depth EstimationMonocular Depth Estimation	CodeCode Available	5	5
Continuous Thought Machines	May 8, 2025	Computational EfficiencyQuestion Answering	CodeCode Available	5	5
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models	Jan 29, 2024	DecoderMixture-of-Experts	CodeCode Available	5	5
MiMo: Unlocking the Reasoning Potential of Language Model -- From Pretraining to Posttraining	May 12, 2025	Language ModelingLanguage Modelling	CodeCode Available	5	5
Efficient Streaming Language Models with Attention Sinks	Sep 29, 2023	Language ModelingLanguage Modelling	CodeCode Available	5	5
OpenScholar: Synthesizing Scientific Literature with Retrieval-augmented LMs	Nov 21, 2024	Retrieval	CodeCode Available	5	5
Group-in-Group Policy Optimization for LLM Agent Training	May 16, 2025	GPUMathematical Reasoning	CodeCode Available	5	5
Sequencer: Deep LSTM for Image Classification	May 4, 2022	Domain Generalizationimage-classification	CodeCode Available	5	5
FunReason: Enhancing Large Language Models' Function Calling via Self-Refinement Multiscale Loss and Automated Data Refinement	May 26, 2025		CodeCode Available	5	5
Darwin Godel Machine: Open-Ended Evolution of Self-Improving Agents	May 29, 2025	Meta-Learning	CodeCode Available	5	5
EvoGit: Decentralized Code Evolution via Git-Based Multi-Agent Collaboration	Jun 1, 2025		CodeCode Available	5	5
Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models	Jun 5, 2025	RerankingRetrieval	CodeCode Available	5	5
SoundMind: RL-Incentivized Logic Reasoning for Audio-Language Models	Jun 15, 2025	Logical ReasoningReinforcement Learning (RL)	CodeCode Available	5	5
Matrix-Game: Interactive World Foundation Model	Jun 23, 2025	Minecraftmodel	CodeCode Available	5	5