The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

510,095 papers251,776 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 401–450 of 177341 papers

Title	Date	Tasks	Status	Hype	Score
MMSU: A Massive Multi-task Spoken Language Understanding and Reasoning Benchmark	Jun 5, 2025	RhythmSpoken Language Understanding	CodeCode Available	7	5
EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty	Jan 26, 2024	Code GenerationInstruction Following	CodeCode Available	7	5
The Prompt Report: A Systematic Survey of Prompting Techniques	Jun 6, 2024	Prompt EngineeringSurvey	CodeCode Available	7	5
Qwen2.5-Omni Technical Report	Mar 26, 2025	Automatic Speech Recognition (ASR)GSM8K	CodeCode Available	7	5
Disaggregated Multi-Tower: Topology-aware Modeling Technique for Efficient Large-Scale Recommendation	Mar 1, 2024		CodeCode Available	7	5
Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems	Mar 31, 2025	AutoMLContinual Learning	CodeCode Available	7	5
Labeling supervised fine-tuning data with the scaling law	May 5, 2024	coreference-resolutionCoreference Resolution	CodeCode Available	7	5
A Survey of Graph Retrieval-Augmented Generation for Customized Large Language Models	Jan 21, 2025	RAGRetrieval	CodeCode Available	7	5
When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models	May 16, 2024	In-Context LearningQuestion Answering	CodeCode Available	7	5
DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines	Oct 5, 2023	Language ModelingLanguage Modelling	CodeCode Available	7	5
TotalSegmentator MRI: Robust Sequence-independent Segmentation of Multiple Anatomic Structures in MRI	May 29, 2024	MRI segmentation	CodeCode Available	7	5
RouteLLM: Learning to Route LLMs with Preference Data	Jun 26, 2024	Data AugmentationTransfer Learning	CodeCode Available	7	5
InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image Generation	Apr 3, 2024	Image GenerationText to Image Generation	CodeCode Available	7	5
YOLOv12: Attention-Centric Real-Time Object Detectors	Feb 18, 2025	GPUObject	CodeCode Available	7	5
Long-form music generation with latent diffusion	Apr 16, 2024	Audio GenerationForm	CodeCode Available	7	5
LLM-AutoDiff: Auto-Differentiate Any LLM Workflow	Jan 28, 2025	Prompt EngineeringQuestion Answering	CodeCode Available	7	5
Global Structure-from-Motion Revisited	Jul 29, 2024	16k	CodeCode Available	7	5
Revisiting Feature Prediction for Learning Visual Representations from Video	Feb 15, 2024	Prediction	CodeCode Available	7	5
Fast Text-to-Audio Generation with Adversarial Post-Training	May 13, 2025	ARCAudio Generation	CodeCode Available	7	5
GLM-4-Voice: Towards Intelligent and Human-Like End-to-End Spoken Chatbot	Dec 3, 2024	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	CodeCode Available	7	5
V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning	Jun 11, 2025	Action AnticipationLarge Language Model	CodeCode Available	7	5
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention	Jun 16, 2025	Mixture-of-ExpertsReinforcement Learning (RL)	CodeCode Available	7	5
Flow Matching Guide and Code	Dec 9, 2024	Text Generation	CodeCode Available	7	5
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads	Jan 19, 2024		CodeCode Available	7	5
ManiSkill3: GPU Parallelized Robotics Simulation and Rendering for Generalizable Embodied AI	Oct 1, 2024	GPUImitation Learning	CodeCode Available	7	5
Improving Diffusion Models for Authentic Virtual Try-on in the Wild	Mar 8, 2024	Virtual Try-on	CodeCode Available	7	5
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena	Jun 9, 2023	ChatbotLanguage Modelling	CodeCode Available	7	5
The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree Search	Apr 10, 2025	scientific discovery	CodeCode Available	7	5
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models	Feb 29, 2024	Language ModellingMamba	CodeCode Available	7	5
Skywork-R1V3 Technical Report	Jul 8, 2025	cross-modal alignmentMathematical Reasoning	CodeCode Available	7	5
Interactive Prompt Debugging with Sequence Salience	Apr 11, 2024	Sentencetext-classification	CodeCode Available	7	5
gsplat: An Open-Source Library for Gaussian Splatting	Sep 10, 2024		CodeCode Available	7	5
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers	Oct 31, 2022	GPULanguage Modelling	CodeCode Available	7	5
EvoAgentX: An Automated Framework for Evolving Agentic Workflows	Jul 4, 2025	Code GenerationMath	CodeCode Available	7	5
DataComp-LM: In search of the next generation of training sets for language models	Jun 17, 2024	Language ModellingMMLU	CodeCode Available	7	5
VITA: Towards Open-Source Interactive Omni Multimodal LLM	Aug 9, 2024	Language ModelingLanguage Modelling	CodeCode Available	7	5
Segment Anything in Medical Images and Videos: Benchmark and Deployment	Aug 6, 2024	BenchmarkingSegmentation	CodeCode Available	7	5
LLM Reasoners: New Evaluation, Library, and Analysis of Step-by-Step Reasoning with Large Language Models	Apr 8, 2024		CodeCode Available	7	5
Cradle: Empowering Foundation Agents Towards General Computer Control	Mar 5, 2024	Efficient Exploration	CodeCode Available	7	5
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments	Apr 11, 2024	Benchmarking	CodeCode Available	7	5
Efficient Track Anything	Nov 28, 2024	ObjectSegmentation	CodeCode Available	7	5
Streamlining Ocean Dynamics Modeling with Fourier Neural Operators: A Multiobjective Hyperparameter and Architecture Optimization Approach	Apr 7, 2024	Efficient ExplorationHyperparameter Optimization	CodeCode Available	7	5
Embedding Atlas: Low-Friction, Interactive Embedding Visualization	May 9, 2025	Friction	CodeCode Available	7	5
A Library for Learning Neural Operators	Dec 13, 2024	Operator learning	CodeCode Available	7	5
Kimi k1.5: Scaling Reinforcement Learning with LLMs	Jan 22, 2025	Mathreinforcement-learning	CodeCode Available	7	5
AutoCodeRover: Autonomous Program Improvement	Apr 8, 2024	Bug fixingCode Search	CodeCode Available	7	5
S*: Test Time Scaling for Code Generation	Feb 20, 2025	Code GenerationMath	CodeCode Available	7	5
RT-DETRv2: Improved Baseline with Bag-of-Freebies for Real-Time Detection Transformer	Jul 24, 2024	Data AugmentationDecoder	CodeCode Available	7	5
AI-Researcher: Autonomous Scientific Innovation	May 24, 2025	scientific discovery	CodeCode Available	7	5
HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models	May 23, 2024	HippocampusKnowledge Graphs	CodeCode Available	7	5