The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 5801–5825 of 661570 papers

Title	Date	Status
POLCA: Stochastic Generative Optimization with LLM	Mar 16, 2026	CodeCode Available
SpiralDiff: Spiral Diffusion with LoRA for RGB-to-RAW Conversion Across Cameras	Mar 16, 2026	CodeCode Available
Empowering Chemical Structures with Biological Insights for Scalable Phenotypic Virtual Screening	Mar 16, 2026	CodeCode Available
PiGRAND: Physics-informed Graph Neural Diffusion for Intelligent Additive Manufacturing	Mar 16, 2026	CodeCode Available
Invisible failures in human-AI interactions	Mar 16, 2026	CodeCode Available
ViFeEdit: A Video-Free Tuner of Your Video Diffusion Transformer	Mar 16, 2026	CodeCode Available
SlovKE: A Large-Scale Dataset and LLM Evaluation for Slovak Keyphrase Extraction	Mar 16, 2026	CodeCode Available
InterveneBench: Benchmarking LLMs for Intervention Reasoning and Causal Study Design in Real Social Systems	Mar 16, 2026	CodeCode Available
CardioComposer: Leveraging Differentiable Geometry for Compositional Control of Anatomical Diffusion Models	Mar 16, 2026	CodeCode Available
Beyond the Embedding Bottleneck: Adaptive Retrieval-Augmented 3D CT Report Generation	Mar 16, 2026	CodeCode Available
W2T: LoRA Weights Already Know What They Can Do	Mar 16, 2026	CodeCode Available
Vietnamese Automatic Speech Recognition: A Revisit	Mar 16, 2026	CodeCode Available
Impatient Users Confuse AI Agents: High-fidelity Simulations of Human Traits for Testing Agents	Mar 16, 2026	CodeCode Available
SciPostLayoutTree: A Dataset for Structural Analysis of Scientific Posters	Mar 16, 2026	CodeCode Available
Imagine-then-Plan: Agent Learning from Adaptive Lookahead with World Models	Mar 16, 2026	CodeCode Available
Surprised by Attention: Predictable Query Dynamics for Time Series Anomaly Detection	Mar 16, 2026	CodeCode Available
M2IR: Proactive All-in-One Image Restoration via Mamba-style Modulation and Mixture-of-Experts	Mar 16, 2026	CodeCode Available
TopoVST: Toward Topology-fidelitous Vessel Skeleton Tracking	Mar 16, 2026	CodeCode Available
MER-Bench: A Comprehensive Benchmark for Multimodal Meme Reappraisal	Mar 16, 2026	CodeCode Available
Mastering the Minority: An Uncertainty-guided Multi-Expert Framework for Challenging-tailed Sequence Learning	Mar 16, 2026	CodeCode Available
VideoChat-A1: Thinking with Long Videos by Chain-of-Shot Reasoning	Mar 16, 2026	CodeCode Available
Rationale-Enhanced Decoding for Multi-modal Chain-of-Thought	Mar 16, 2026	CodeCode Available
FingerTip 20K: A Benchmark for Proactive and Personalized Mobile LLM Agents	Mar 16, 2026	CodeCode Available
AutoEP: LLMs-Driven Automation of Hyperparameter Evolution for Metaheuristic Algorithms	Mar 16, 2026	CodeCode Available
Overthinking Reduction with Decoupled Rewards and Curriculum Data Scheduling	Mar 16, 2026	CodeCode Available