The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

659,983 papers248,104 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1401–1425 of 659983 papers

Title	Date	Tasks	Status	Hype
Atom of Thoughts for Markov LLM Test-Time Scaling	Feb 17, 2025		CodeCode Available	4
A-MEM: Agentic Memory for LLM Agents	Feb 17, 2025	Large Language Model	CodeCode Available	4
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention	Feb 16, 2025		CodeCode Available	4
SkyReels-A1: Expressive Portrait Animation in Video Diffusion Transformers	Feb 15, 2025	Image AnimationPortrait Animation	CodeCode Available	4
KernelBench: Can LLMs Write Efficient GPU Kernels?	Feb 14, 2025	GPU	CodeCode Available	4
SQuARE: Sequential Question Answering Reasoning Engine for Enhanced Chain-of-Thought in Large Language Models	Feb 13, 2025	Question AnsweringRAG	CodeCode Available	4
Light-A-Video: Training-free Video Relighting via Progressive Light Fusion	Feb 12, 2025	Image Relighting	CodeCode Available	4
AgentSociety: Large-Scale Simulation of LLM-Driven Generative Agents Advances Understanding of Human Behaviors and Society	Feb 12, 2025		CodeCode Available	4
Enhance-A-Video: Better Generated Video for Free	Feb 11, 2025	Video Generation	CodeCode Available	4
Training Sparse Mixture Of Experts Text Embedding Models	Feb 11, 2025	Mixture-of-ExpertsRAG	CodeCode Available	4
CodeI/O: Condensing Reasoning Patterns via Code Input-Output Prediction	Feb 11, 2025	Code GenerationMath	CodeCode Available	4
Steel-LLM:From Scratch to Open Source -- A Personal Journey in Building a Chinese-Centric LLM	Feb 10, 2025	Language ModelingLanguage Modelling	CodeCode Available	4
Accelerating Data Processing and Benchmarking of AI Models for Pathology	Feb 10, 2025	Benchmarking	CodeCode Available	4
ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates	Feb 10, 2025	Hierarchical Reinforcement LearningLanguage Modeling	CodeCode Available	4
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach	Feb 7, 2025	Language ModelingLanguage Modelling	CodeCode Available	4
Latent Swap Joint Diffusion for 2D Long-Form Latent Generation	Feb 7, 2025	Audio GenerationDenoising	CodeCode Available	4
Meta Audiobox Aesthetics: Unified Automatic Quality Assessment for Speech, Music, and Sound	Feb 7, 2025	Benchmarking	CodeCode Available	4
Self-Supervised Prompt Optimization	Feb 7, 2025		CodeCode Available	4
Llasa: Scaling Train-Time and Inference-Time Compute for Llama-based Speech Synthesis	Feb 6, 2025	Speech Synthesis	CodeCode Available	4
Identify Critical KV Cache in LLM Inference from an Output Perturbation Perspective	Feb 6, 2025		CodeCode Available	4
Rankify: A Comprehensive Python Toolkit for Retrieval, Re-Ranking, and Retrieval-Augmented Generation	Feb 4, 2025	BenchmarkingInformation Retrieval	CodeCode Available	4
Sundial: A Family of Highly Capable Time Series Foundation Models	Feb 2, 2025	Representation LearningTime Series	CodeCode Available	4
LLMDet: Learning Strong Open-Vocabulary Object Detectors under the Supervision of Large Language Models	Jan 31, 2025	Caption GenerationLanguage Modeling	CodeCode Available	4
Transcoders Beat Sparse Autoencoders for Interpretability	Jan 31, 2025		CodeCode Available	4
Molecular-driven Foundation Model for Oncologic Pathology	Jan 28, 2025	BenchmarkingDiagnostic	CodeCode Available	4