The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

658,356 papers258,216 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 201–250 of 658356 papers

Title	Date	Tasks	Status	Hype
LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression	Mar 19, 2024	GSM8KLanguage Modelling	CodeCode Available	9
StableToolBench: Towards Stable Large-Scale Benchmarking on Tool Learning of Large Language Models	Mar 12, 2024	Benchmarking	CodeCode Available	9
ORPO: Monolithic Preference Optimization without Reference Model	Mar 12, 2024	model	CodeCode Available	9
LLM4Decompile: Decompiling Binary Code with Large Language Models	Mar 8, 2024	HumanEval	CodeCode Available	9
Divide and Conquer: High-Resolution Industrial Anomaly Detection via Memory Efficient Tiled Ensemble	Mar 7, 2024	Anomaly DetectionGPU	CodeCode Available	9
Yi: Open Foundation Models by 01.AI	Mar 7, 2024	AttributeChatbot	CodeCode Available	9
OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on	Mar 4, 2024	DenoisingImage Generation	CodeCode Available	9
TripoSR: Fast 3D Object Reconstruction from a Single Image	Mar 4, 2024	3D Generation3D Object Reconstruction	CodeCode Available	9
World Model on Million-Length Video And Language With Blockwise RingAttention	Feb 13, 2024	4kVideo Understanding	CodeCode Available	9
UFO: A UI-Focused Agent for Windows OS Interaction	Feb 8, 2024	Navigate	CodeCode Available	9
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models	Feb 5, 2024	Arithmetic ReasoningMath	CodeCode Available	9
Natural language guidance of high-fidelity text-to-speech with synthetic annotations	Feb 2, 2024	In-Context LearningLanguage Modeling	CodeCode Available	9
OLMo: Accelerating the Science of Language Models	Feb 1, 2024	Language ModelingLanguage Modelling	CodeCode Available	9
YOLO-World: Real-Time Open-Vocabulary Object Detection	Jan 30, 2024	Instance SegmentationLanguage Modeling	CodeCode Available	9
Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks	Jan 25, 2024	Segmentation	CodeCode Available	9
Steering Language Models with Game-Theoretic Solvers	Jan 24, 2024	Imitation LearningScheduling	CodeCode Available	9
CMMMU: A Chinese Massive Multi-discipline Multimodal Understanding Benchmark	Jan 22, 2024		CodeCode Available	9
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data	Jan 19, 2024	Data AugmentationDepth Estimation	CodeCode Available	9
VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models	Jan 17, 2024	Text-to-Video GenerationVideo Generation	CodeCode Available	9
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism	Jan 5, 2024		CodeCode Available	9
Perception Encoder: The best visual embeddings are not at the output of the network	Apr 17, 2025	Depth EstimationLanguage Modeling	CodeCode Available	8
GPT4All: An Ecosystem of Open Source Compressed Language Models	Nov 6, 2023		CodeCode Available	8
Llama 2: Open Foundation and Fine-Tuned Chat Models	Jul 18, 2023	Arithmetic Reasoning	CodeCode Available	8
Adapting Large Language Model with Speech for Fully Formatted End-to-End Speech Recognition	Jul 17, 2023	DecoderLanguage Modeling	CodeCode Available	8
DETRs Beat YOLOs on Real-time Object Detection	Apr 17, 2023	2D Object DetectionDecoder	CodeCode Available	8
Robust Speech Recognition via Large-Scale Weak Supervision	Dec 6, 2022	Robust Speech Recognitionspeech-recognition	CodeCode Available	8
Fine-mixing: Mitigating Backdoors in Fine-tuned Language Models	Oct 18, 2022	Language ModellingSentence	CodeCode Available	8
DocLayNet: A Large Human-Annotated Dataset for Document-Layout Analysis	Jun 2, 2022	Document Layout AnalysisObject Detection	CodeCode Available	8
Attention Residuals	Mar 16, 2026		—Unverified	7
WideSeek-R1: Exploring Width Scaling for Broad Information Seeking via Multi-Agent Reinforcement Learning	Mar 12, 2026		—Unverified	7
Let It Flow: Agentic Crafting on Rock and Roll, Building the ROME Model within an Open Agentic Learning Ecosystem	Mar 12, 2026		—Unverified	7
Pretraining Large Language Models with NVFP4	Mar 4, 2026		—Unverified	7
dLLM: Simple Diffusion Language Modeling	Feb 26, 2026		—Unverified	7
GigaBrain-0.5M*: a VLA That Learns From World Model-Based Reinforcement Learning	Feb 26, 2026		—Unverified	7
SAM 3D Body: Robust Full-Body Human Mesh Recovery	Feb 17, 2026		—Unverified	7
Qwen3-ASR Technical Report	Jan 30, 2026		—Unverified	7
Advancing Open-source World Models	Jan 28, 2026		—Unverified	7
Is Diversity All You Need for Scalable Robotic Manipulation?	Jul 8, 2025	AllDiversity	CodeCode Available	7
Skywork-R1V3 Technical Report	Jul 8, 2025	cross-modal alignmentMathematical Reasoning	CodeCode Available	7
EvoAgentX: An Automated Framework for Evolving Agentic Workflows	Jul 4, 2025	Code GenerationMath	CodeCode Available	7
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning	Jul 1, 2025	document understandingMultimodal Reasoning	CodeCode Available	7
OmniGen2: Exploration to Advanced Multimodal Generation	Jun 23, 2025	Image Generationmultimodal generation	CodeCode Available	7
From Bytes to Ideas: Language Modeling with Autoregressive U-Nets	Jun 17, 2025	Language ModelingLanguage Modelling	CodeCode Available	7
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention	Jun 16, 2025	Mixture-of-ExpertsReinforcement Learning (RL)	CodeCode Available	7
AgentOrchestra: A Hierarchical Multi-Agent Framework for General-Purpose Task Solving	Jun 14, 2025		CodeCode Available	7
ComfyUI-R1: Exploring Reasoning Models for Workflow Generation	Jun 11, 2025	4k	CodeCode Available	7
V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning	Jun 11, 2025	Action AnticipationLarge Language Model	CodeCode Available	7
Step-Audio-AQAA: a Fully End-to-End Expressive Large Audio Language Model	Jun 10, 2025	Language ModelingLanguage Modelling	CodeCode Available	7
Reinforcement Learning Optimization for Large-Scale Learning: An Efficient and User-Friendly Scaling Library	Jun 6, 2025	Management	CodeCode Available	7
MMSU: A Massive Multi-task Spoken Language Understanding and Reasoning Benchmark	Jun 5, 2025	RhythmSpoken Language Understanding	CodeCode Available	7