The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 7126–7150 of 177340 papers

Title	Date	Tasks	Status	Hype	Score
Discovering uncertainty: Gaussian constitutive neural networks with correlated weights	Mar 16, 2025		CodeCode Available	2	5
InterCode: Standardizing and Benchmarking Interactive Coding with Execution Feedback	Jun 26, 2023	BenchmarkingCode Generation	CodeCode Available	2	5
SpecExec: Massively Parallel Speculative Decoding for Interactive LLM Inference on Consumer Devices	Jun 4, 2024	Text Generation	CodeCode Available	2	5
CAGRA: Highly Parallel Graph Construction and Approximate Nearest Neighbor Search for GPUs	Aug 29, 2023	CPUGPU	CodeCode Available	2	5
Defending LLMs against Jailbreaking Attacks via Backtranslation	Feb 26, 2024	Language Modelling	CodeCode Available	2	5
TabDDPM: Modelling Tabular Data with Diffusion Models	Sep 30, 2022	Denoising	CodeCode Available	2	5
MCIBI++: Soft Mining Contextual Information Beyond Image for Semantic Segmentation	Sep 9, 2022	SegmentationSemantic Segmentation	CodeCode Available	2	5
RE-Bench: Evaluating frontier AI R&D capabilities of language model agents against human experts	Nov 22, 2024	AI AgentLanguage Modeling	CodeCode Available	2	5
3D LiDAR Mapping in Dynamic Environments Using a 4D Implicit Neural Representation	May 6, 2024	Autonomous VehiclesDecoder	CodeCode Available	2	5
Long and Short Guidance in Score identity Distillation for One-Step Text-to-Image Generation	Jun 3, 2024	Image GenerationText to Image Generation	CodeCode Available	2	5
Mitigating Hallucinations in Large Vision-Language Models with Instruction Contrastive Decoding	Mar 27, 2024	AttributeDecision Making	CodeCode Available	2	5
RepoHyper: Search-Expand-Refine on Semantic Graphs for Repository-Level Code Completion	Mar 10, 2024	Code CompletionLink Prediction	CodeCode Available	2	5
MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix	May 19, 2025		CodeCode Available	2	5
Multi-Agent Trajectory Prediction with Difficulty-Guided Feature Enhancement Network	Jul 26, 2024	Autonomous DrivingDecoder	CodeCode Available	2	5
SRGS: Super-Resolution 3D Gaussian Splatting	Apr 16, 2024	3DGSNeRF	CodeCode Available	2	5
ALERT: A Comprehensive Benchmark for Assessing Large Language Models' Safety through Red Teaming	Apr 6, 2024	Adversarial RobustnessDialogue Safety Prediction	CodeCode Available	2	5
AdaNeRF: Adaptive Sampling for Real-time Rendering of Neural Radiance Fields	Jul 21, 2022	Novel View Synthesis	CodeCode Available	2	5
Language Models can Self-Lengthen to Generate Long Texts	Oct 31, 2024	Text Generation	CodeCode Available	2	5
Generate, but Verify: Reducing Hallucination in Vision-Language Models with Retrospective Resampling	Apr 17, 2025	Hallucination	CodeCode Available	2	5
Nullu: Mitigating Object Hallucinations in Large Vision-Language Models via HalluSpace Projection	Dec 18, 2024		CodeCode Available	2	5
Multi-modal Molecule Structure-text Model for Text-based Retrieval and Editing	Dec 21, 2022	Contrastive LearningDrug Design	CodeCode Available	2	5
MM-Retinal: Knowledge-Enhanced Foundational Pretraining with Fundus Image-Text Expertise	May 20, 2024		CodeCode Available	2	5
VOOM: Robust Visual Object Odometry and Mapping using Hierarchical Landmarks	Feb 21, 2024	Computational EfficiencyObject	CodeCode Available	2	5
Lenia - Biology of Artificial Life	Dec 13, 2018	Artificial LifeDiversity	CodeCode Available	2	5
WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models	Jun 26, 2024	ChatbotRed Teaming	CodeCode Available	2	5