The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 151–200 of 177340 papers

Title	Date	Tasks	Status	Hype	Score
(Perhaps) Beyond Human Translation: Harnessing Multi-Agent Collaboration for Translating Ultra-Long Literary Texts	May 20, 2024	Machine TranslationTranslation	CodeCode Available	9	5
LawGPT: A Chinese Legal Knowledge-Enhanced Large Language Model	Jun 7, 2024	Language ModelingLanguage Modelling	CodeCode Available	9	5
BABILong: Testing the Limits of LLMs with Long Context Reasoning-in-a-Haystack	Jun 14, 2024	Question AnsweringRetrieval-augmented Generation	CodeCode Available	9	5
NeedleBench: Can LLMs Do Retrieval and Reasoning in Information-Dense Context?	Jul 16, 2024	4k8k	CodeCode Available	9	5
YuE: Scaling Open Foundation Models for Long-Form Music Generation	Mar 11, 2025	FormIn-Context Learning	CodeCode Available	9	5
Depth Anything V2	Jun 13, 2024	Depth EstimationDiversity	CodeCode Available	9	5
LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning	Mar 26, 2024	GPUGSM8K	CodeCode Available	9	5
Visually Descriptive Language Model for Vector Graphics Reasoning	Apr 9, 2024	DescriptiveLanguage Modeling	CodeCode Available	9	5
KAG: Boosting LLMs in Professional Domains via Knowledge Augmented Generation	Sep 10, 2024	Knowledge GraphsQuestion Answering	CodeCode Available	9	5
World Model on Million-Length Video And Language With Blockwise RingAttention	Feb 13, 2024	4kVideo Understanding	CodeCode Available	9	5
UFO2: The Desktop AgentOS	Apr 20, 2025		CodeCode Available	9	5
LLM4Decompile: Decompiling Binary Code with Large Language Models	Mar 8, 2024	HumanEval	CodeCode Available	9	5
Do Large Language Models Need a Content Delivery Network?	Sep 16, 2024	In-Context Learning	CodeCode Available	9	5
DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding	Dec 13, 2024	Chart UnderstandingMixture-of-Experts	CodeCode Available	9	5
LatentSync: Audio Conditioned Latent Diffusion Models for Lip Sync	Dec 12, 2024	Portrait Animation	CodeCode Available	9	5
FinRobot: An Open-Source AI Agent Platform for Financial Applications using Large Language Models	May 23, 2024	AI AgentDecision Making	CodeCode Available	9	5
MiniCPM4: Ultra-Efficient LLMs on End Devices	Jun 9, 2025	Large Language Model	CodeCode Available	9	5
Kodezi Chronos: A Debugging-First Language Model for Repository-Scale, Memory-Driven Code Understanding	Jul 14, 2025	Code GenerationLanguage Modeling	CodeCode Available	9	5
Data-Juicer 2.0: Cloud-Scale Adaptive Data Processing for and with Foundation Models	Dec 23, 2024	CPU	CodeCode Available	9	5
OLMo: Accelerating the Science of Language Models	Feb 1, 2024	Language ModelingLanguage Modelling	CodeCode Available	9	5
MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies	Apr 9, 2024	Domain Adaptation	CodeCode Available	9	5
UltraRAG: A Modular and Automated Toolkit for Adaptive Retrieval-Augmented Generation	Mar 31, 2025	RAGRetrieval	CodeCode Available	9	5
Model Stock: All we need is just a few fine-tuned models	Mar 28, 2024	All	CodeCode Available	9	5
CacheBlend: Fast Large Language Model Serving for RAG with Cached Knowledge Fusion	May 26, 2024	Language ModelingLanguage Modelling	CodeCode Available	9	5
Large Action Models: From Inception to Implementation	Dec 13, 2024	Action Generation	CodeCode Available	9	5
A Comprehensive Survey of Mixture-of-Experts: Algorithms, Theory, and Applications	Mar 10, 2025	Continual LearningMeta-Learning	CodeCode Available	9	5
2 OLMo 2 Furious	Dec 31, 2024		CodeCode Available	9	5
LTX-Video: Realtime Video Latent Diffusion	Dec 30, 2024	DenoisingGPU	CodeCode Available	9	5
VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models	Jan 17, 2024	Text-to-Video GenerationVideo Generation	CodeCode Available	9	5
s1: Simple test-time scaling	Jan 31, 2025	Language ModelingLanguage Modelling	CodeCode Available	9	5
FastVLM: Efficient Vision Encoding for Vision Language Models	Dec 17, 2024		CodeCode Available	9	5
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data	Jan 19, 2024	Data AugmentationDepth Estimation	CodeCode Available	9	5
Arcee's MergeKit: A Toolkit for Merging Large Language Models	Mar 20, 2024	Language ModelingLanguage Modelling	CodeCode Available	9	5
SkyServe: Serving AI Models across Regions and Clouds with Spot Instances	Nov 3, 2024		CodeCode Available	9	5
PP-FormulaNet: Bridging Accuracy and Efficiency in Advanced Formula Recognition	Mar 24, 2025		CodeCode Available	9	5
When Do We Not Need Larger Vision Models?	Mar 19, 2024	Depth Estimation	CodeCode Available	9	5
garak: A Framework for Security Probing Large Language Models	Jun 16, 2024	Red Teaming	CodeCode Available	9	5
LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression	Mar 19, 2024	GSM8KLanguage Modelling	CodeCode Available	9	5
Toward Guidance-Free AR Visual Generation via Condition Contrastive Alignment	Oct 12, 2024	Language ModellingPhilosophy	CodeCode Available	9	5
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence	Jun 17, 2024	16kLanguage Modeling	CodeCode Available	9	5
SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory	Nov 18, 2024	Object TrackingVisual Object Tracking	CodeCode Available	9	5
InternLM2 Technical Report	Mar 26, 2024	4kLong-Context Understanding	CodeCode Available	9	5
DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception	Oct 16, 2024	Document Layout Analysisdocument understanding	CodeCode Available	9	5
PP-DocLayout: A Unified Document Layout Detection Model to Accelerate Large-Scale Data Construction	Mar 21, 2025	CPUDocument Layout Analysis	CodeCode Available	9	5
VLM-R1: A Stable and Generalizable R1-style Large Vision-Language Model	Apr 10, 2025	Language ModelingLanguage Modelling	CodeCode Available	9	5
UFO: A UI-Focused Agent for Windows OS Interaction	Feb 8, 2024	Navigate	CodeCode Available	9	5
AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation	Mar 26, 2024	DiversityFace Reenactment	CodeCode Available	9	5
RULER: What's the Real Context Size of Your Long-Context Language Models?	Apr 9, 2024	Long-Context Understanding	CodeCode Available	9	5
MindSearch: Mimicking Human Minds Elicits Deep AI Searcher	Jul 29, 2024	2D Semantic Segmentation task 1 (8 classes)graph construction	CodeCode Available	9	5
Emilia: A Large-Scale, Extensive, Multilingual, and Diverse Dataset for Speech Generation	Jan 27, 2025		CodeCode Available	9	5