The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 8001–8025 of 177340 papers

Title	Date	Tasks	Status	Hype	Score
Efficiently Learning at Test-Time: Active Fine-Tuning of LLMs	Oct 10, 2024	Active LearningLanguage Modeling	CodeCode Available	2	5
Evaluating Quantized Large Language Models	Feb 28, 2024	MambaQuantization	CodeCode Available	2	5
Edu-ConvoKit: An Open-Source Library for Education Conversation Data	Feb 7, 2024		CodeCode Available	2	5
Calibrated Self-Rewarding Vision Language Models	May 23, 2024	HallucinationLanguage Modelling	CodeCode Available	2	5
PERT: Pre-training BERT with Permuted Language Model	Mar 14, 2022	Language ModelingLanguage Modelling	CodeCode Available	2	5
LLaSE-G1: Incentivizing Generalization Capability for LLaMA-based Speech Enhancement	Mar 1, 2025	Language ModelingLanguage Modelling	CodeCode Available	2	5
Training Diffusion Models with Reinforcement Learning	May 22, 2023	Decision MakingDenoising	CodeCode Available	2	5
GoLLIE: Annotation Guidelines improve Zero-Shot Information-Extraction	Oct 5, 2023	Event Argument ExtractionEvent Extraction	CodeCode Available	2	5
All in One: Exploring Unified Video-Language Pre-training	Mar 14, 2022	AllLanguage Modelling	CodeCode Available	2	5
A Survey on Multimodal Large Language Models for Autonomous Driving	Nov 21, 2023	Autonomous Driving	CodeCode Available	2	5
Towards A Unified Conformer Structure: from ASR to ASV Task	Nov 14, 2022	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	CodeCode Available	2	5
DocPrompting: Generating Code by Retrieving the Docs	Jul 13, 2022	Code Generation	CodeCode Available	2	5
AllSpark: Reborn Labeled Features from Unlabeled in Transformer for Semi-Supervised Semantic Segmentation	Mar 4, 2024	Semantic SegmentationSemi-Supervised Semantic Segmentation	CodeCode Available	2	5
Open CaptchaWorld: A Comprehensive Web-based Platform for Testing and Benchmarking Multimodal LLM Agents	May 30, 2025	BenchmarkingBlocking	CodeCode Available	2	5
Exploring Video Quality Assessment on User Generated Contents from Aesthetic and Technical Perspectives	Nov 9, 2022	DisentanglementVideo Generation	CodeCode Available	2	5
Unsupervised Representation Learning from Pre-trained Diffusion Probabilistic Models	Dec 26, 2022	Image ReconstructionRepresentation Learning	CodeCode Available	2	5
TGL: A General Framework for Temporal GNN Training on Billion-Scale Graphs	Mar 28, 2022	CPUGPU	CodeCode Available	2	5
Hidet: Task-Mapping Programming Paradigm for Deep Learning Tensor Programs	Oct 18, 2022	Deep LearningScheduling	CodeCode Available	2	5
Prompting Large Language Models to Tackle the Full Software Development Lifecycle: A Case Study	Mar 13, 2024	Code Generation	CodeCode Available	2	5
REEF: Representation Encoding Fingerprints for Large Language Models	Oct 18, 2024		CodeCode Available	2	5
Modeling the Label Distributions for Weakly-Supervised Semantic Segmentation	Mar 20, 2024	Semantic SegmentationWeakly supervised Semantic Segmentation	CodeCode Available	2	5
MotionDiffuse: Text-Driven Human Motion Generation with Diffusion Model	Aug 31, 2022	DenoisingMotion Generation	CodeCode Available	2	5
Large language models surpass human experts in predicting neuroscience results	Mar 4, 2024		CodeCode Available	2	5
Owl-1: Omni World Model for Consistent Long Video Generation	Dec 12, 2024	Video Generation	CodeCode Available	2	5
Diving Deeper Into Pedestrian Behavior Understanding: Intention Estimation, Action Prediction, and Event Risk Assessment	Jun 29, 2024	Prediction	CodeCode Available	2	5