The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 101–150 of 474278 papers

Title	Date	Tasks	Status	Hype
SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering	May 6, 2024	Bug fixingLanguage Modeling	CodeCode Available	11
HybridFlow: A Flexible and Efficient RLHF Framework	Sep 28, 2024	Large Language Model	CodeCode Available	11
PaperBanana: Automating Academic Illustration for AI Scientists	Jan 30, 2026		—Unverified	9
Qwen3-TTS Technical Report	Jan 22, 2026		—Unverified	9
MuseTalk: Real-Time High-Fidelity Video Dubbing via Spatio-Temporal Sampling	Oct 14, 2024	Audio-Visual SynchronizationGPU	CodeCode Available	9
Moshi: a speech-text foundation model for real-time dialogue	Sep 17, 2024	Action DetectionActivity Detection	CodeCode Available	9
OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on	Mar 4, 2024	DenoisingImage Generation	CodeCode Available	9
RWKV-7 "Goose" with Expressive Dynamic State Evolution	Mar 18, 2025	In-Context LearningLanguage Modeling	CodeCode Available	9
OpenELM: An Efficient Language Model Family with Open Training and Inference Framework	Apr 22, 2024	Language ModelingLanguage Modelling	CodeCode Available	9
HART: Efficient Visual Generation with Hybrid Autoregressive Transformer	Oct 14, 2024	Image GenerationImage Reconstruction	CodeCode Available	9
MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec Transformer	Sep 1, 2024	Self-Supervised Learningtext-to-speech	CodeCode Available	9
FinRobot: AI Agent for Equity Research and Valuation with Large Language Models	Nov 13, 2024	AI Agent	CodeCode Available	9
Language agents achieve superhuman synthesis of scientific knowledge	Sep 10, 2024	ArticlesInformation Retrieval	CodeCode Available	9
Contextual Augmented Multi-Model Programming (CAMP): A Hybrid Local-Cloud Copilot Framework	Oct 20, 2024	Code CompletionRAG	CodeCode Available	9
StableToolBench: Towards Stable Large-Scale Benchmarking on Tool Learning of Large Language Models	Mar 12, 2024	Benchmarking	CodeCode Available	9
Depth Pro: Sharp Monocular Metric Depth in Less Than a Second	Oct 2, 2024	Depth EstimationGPU	CodeCode Available	9
ORPO: Monolithic Preference Optimization without Reference Model	Mar 12, 2024	model	CodeCode Available	9
MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention	Jul 2, 2024	GPULanguage Modelling	CodeCode Available	9
Sapiens: Foundation for Human Vision Models	Aug 22, 2024	2D Human Pose Estimation2D Pose Estimation	CodeCode Available	9
SkyReels-V2: Infinite-length Film Generative Model	Apr 17, 2025	Large Language Modelmodel	CodeCode Available	9
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models	Feb 5, 2024	Arithmetic ReasoningMath	CodeCode Available	9
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism	Jan 5, 2024		CodeCode Available	9
SANA 1.5: Efficient Scaling of Training-Time and Inference-Time Compute in Linear Diffusion Transformer	Jan 30, 2025	Image GenerationModel Compression	CodeCode Available	9
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model	May 7, 2024	Language ModelingLanguage Modelling	CodeCode Available	9
BABILong: Testing the Limits of LLMs with Long Context Reasoning-in-a-Haystack	Jun 14, 2024	Question AnsweringRetrieval-augmented Generation	CodeCode Available	9
TorchTitan: One-stop PyTorch native solution for production ready LLM pre-training	Oct 9, 2024	GPU	CodeCode Available	9
Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion	Jul 1, 2024	Decision MakingPrediction	CodeCode Available	9
Liger Kernel: Efficient Triton Kernels for LLM Training	Oct 14, 2024	ChunkingGPU	CodeCode Available	9
CogVLM2: Visual Language Models for Image and Video Understanding	Aug 29, 2024	MM-VetMVBench	CodeCode Available	9
SuperSimpleNet: Unifying Unsupervised and Supervised Learning for Fast and Reliable Surface Defect Detection	Aug 6, 2024	Anomaly DetectionDefect Detection	CodeCode Available	9
Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks	Jan 25, 2024	Segmentation	CodeCode Available	9
StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation	May 2, 2024	motion predictionStory Generation	CodeCode Available	9
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction	Apr 3, 2024	Image GenerationImage Reconstruction	CodeCode Available	9
LW-DETR: A Transformer Replacement to YOLO for Real-Time Detection	Jun 5, 2024	Decoderobject-detection	CodeCode Available	9
FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving	Jan 2, 2025	GPUScheduling	CodeCode Available	9
Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration	Jun 3, 2024		CodeCode Available	9
Symbolic Learning Enables Self-Evolving Agents	Jun 26, 2024		CodeCode Available	9
Aviary: training language agents on challenging scientific tasks	Dec 30, 2024		CodeCode Available	9
Enhancing Investment Analysis: Optimizing AI-Agent Collaboration in Financial Research	Nov 7, 2024	AI AgentDecision Making	CodeCode Available	9
Metis: A Foundation Speech Generation Model with Masked Generative Pre-training	Feb 5, 2025	Self-Supervised LearningSpeech Enhancement	CodeCode Available	9
Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting	May 20, 2025		CodeCode Available	9
CMMMU: A Chinese Massive Multi-discipline Multimodal Understanding Benchmark	Jan 22, 2024		CodeCode Available	9
YOLO-World: Real-Time Open-Vocabulary Object Detection	Jan 30, 2024	Instance SegmentationLanguage Modeling	CodeCode Available	9
Yi: Open Foundation Models by 01.AI	Mar 7, 2024	AttributeChatbot	CodeCode Available	9
Steering Language Models with Game-Theoretic Solvers	Jan 24, 2024	Imitation LearningScheduling	CodeCode Available	9
VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild	Mar 25, 2024	DecoderLanguage Modeling	CodeCode Available	9
(Perhaps) Beyond Human Translation: Harnessing Multi-Agent Collaboration for Translating Ultra-Long Literary Texts	May 20, 2024	Machine TranslationTranslation	CodeCode Available	9
LawGPT: A Chinese Legal Knowledge-Enhanced Large Language Model	Jun 7, 2024	Language ModelingLanguage Modelling	CodeCode Available	9
AutoAgent: A Fully-Automated and Zero-Code Framework for LLM Agents	Feb 9, 2025	Large Language ModelRAG	CodeCode Available	9
MonkeyOCR: Document Parsing with a Structure-Recognition-Relation Triplet Paradigm	Jun 5, 2025	GPURelation	CodeCode Available	9