The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

659,983 papers248,104 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 751–775 of 177339 papers

Title	Date	Tasks	Status	Hype	Score
Agent-E: From Autonomous Web Navigation to Foundational Design Principles in Agentic Systems	Jul 17, 2024	Autonomous Web NavigationDenoising	CodeCode Available	5	5
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale	Aug 15, 2022	GPULanguage Modelling	CodeCode Available	5	5
SpeechGPT-Gen: Scaling Chain-of-Information Speech Generation	Jan 24, 2024	text-to-speechText to Speech	CodeCode Available	5	5
MMBench: Is Your Multi-modal Model an All-around Player?	Jul 12, 2023	AllInstruction Following	CodeCode Available	5	5
TAPVid-3D: A Benchmark for Tracking Any Point in 3D	Jul 8, 2024	Point Tracking	CodeCode Available	5	5
Retrieval-Augmented Generation for AI-Generated Content: A Survey	Feb 29, 2024	Information RetrievalLarge Language Model	CodeCode Available	5	5
Codec-SUPERB @ SLT 2024: A lightweight benchmark for neural audio codec models	Sep 21, 2024	Language ModelingLanguage Modelling	CodeCode Available	5	5
Improved Distribution Matching Distillation for Fast Image Synthesis	May 23, 2024	Image Generation	CodeCode Available	5	5
Large Language Model based Multi-Agents: A Survey of Progress and Challenges	Jan 21, 2024	Decision MakingLanguage Modeling	CodeCode Available	5	5
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation	Jun 10, 2024	Conditional Image GenerationImage Generation	CodeCode Available	5	5
Mora: Enabling Generalist Video Generation via A Multi-Agent Framework	Mar 20, 2024	Image to Video GenerationText-to-Video Generation	CodeCode Available	5	5
HealthGPT: A Medical Large Vision-Language Model for Unifying Comprehension and Generation via Heterogeneous Knowledge Adaptation	Feb 14, 2025	Language ModelingLanguage Modelling	CodeCode Available	5	5
The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models	Jun 9, 2024	Instruction Following	CodeCode Available	5	5
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks	Jun 12, 2024	Image GenerationLanguage Modeling	CodeCode Available	5	5
Diffusion for World Modeling: Visual Details Matter in Atari	May 20, 2024	Image Generationreinforcement-learning	CodeCode Available	5	5
Flashlight: Enabling Innovation in Tools for Machine Learning	Jan 29, 2022	BIG-bench Machine Learning	CodeCode Available	5	5
Astraios: Parameter-Efficient Instruction Tuning Code Large Language Models	Jan 1, 2024	Code Generationparameter-efficient fine-tuning	CodeCode Available	5	5
Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think	Oct 9, 2024	DenoisingImage Generation	CodeCode Available	5	5
BootsTAP: Bootstrapped Training for Tracking-Any-Point	Feb 1, 2024	Point Tracking	CodeCode Available	5	5
BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset	May 14, 2025	Image Generation	CodeCode Available	5	5
An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion	Aug 2, 2022	Image GenerationPersonalized Image Generation	CodeCode Available	5	5
ChronoMagic-Bench: A Benchmark for Metamorphic Evaluation of Text-to-Time-lapse Video Generation	Jun 26, 2024	Text-to-Video GenerationVideo Generation	CodeCode Available	5	5
OffsetBias: Leveraging Debiased Data for Tuning Evaluators	Jul 9, 2024		CodeCode Available	5	5
Meta-World+: An Improved, Standardized, RL Benchmark	May 16, 2025	Meta Reinforcement Learningreinforcement-learning	CodeCode Available	5	5
MONAI: An open-source framework for deep learning in healthcare	Nov 4, 2022	Deep LearningMedical Image Classification	CodeCode Available	5	5