The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 9501–9525 of 474278 papers

Title	Date	Status
Debunk the Myth of SFT Generalization	Sep 30, 2025	CodeCode Available
Beyond Token Probes: Hallucination Detection via Activation Tensors with ACT-ViT	Sep 30, 2025	CodeCode Available
Which Programming Language and Model Work Best With LLM-as-a-Judge For Code Retrieval?	Sep 30, 2025	CodeCode Available
Flow Autoencoders are Effective Protein Tokenizers	Sep 30, 2025	CodeCode Available
Noise-Guided Transport for Imitation Learning	Sep 30, 2025	CodeCode Available
AutoLabs: Cognitive Multi-Agent Systems with Self-Correction for Autonomous Chemical Experimentation	Sep 30, 2025	CodeCode Available
Annotation-Efficient Active Test-Time Adaptation with Conformal Prediction	Sep 30, 2025	CodeCode Available
Controlled Generation for Private Synthetic Text	Sep 30, 2025	—Unverified
Spatial Reasoning with Vision-Language Models in Ego-Centric Multi-View Scenes	Sep 30, 2025	—Unverified
Towards Agentic OS: An LLM Agent Framework for Linux Schedulers	Sep 30, 2025	CodeCode Available
MMPB: It's Time for Multi-Modal Personalization	Sep 30, 2025	—Unverified
Toward Effective Tool-Integrated Reasoning via Self-Evolved Preference Learning	Sep 30, 2025	—Unverified
FrameThinker: Learning to Think with Long Videos via Multi-Turn Frame Spotlighting	Sep 30, 2025	—Unverified
Swift: An Autoregressive Consistency Model for Efficient Weather Forecasting	Sep 30, 2025	—Unverified
DeepCodeSeek: Real-Time API Retrieval for Context-Aware Code Generation	Sep 30, 2025	—Unverified
Explore-Execute Chain: Towards an Efficient Structured Reasoning Paradigm	Sep 30, 2025	CodeCode Available
TADA: Improved Diffusion Sampling with Training-free Augmented Dynamics	Sep 30, 2025	CodeCode Available
Expert Merging: Model Merging with Unsupervised Expert Alignment and Importance-Guided Layer Chunking	Sep 30, 2025	CodeCode Available
Conda: Column-Normalized Adam for Training Large Language Models Faster	Sep 30, 2025	CodeCode Available
Boundary-to-Region Supervision for Offline Safe Reinforcement Learning	Sep 30, 2025	CodeCode Available
Free Lunch Alignment of Text-to-Image Diffusion Models without Preference Image Pairs	Sep 30, 2025	CodeCode Available
DescribeEarth: Describe Anything for Remote Sensing Images	Sep 30, 2025	CodeCode Available
PIPer: On-Device Environment Setup via Online Reinforcement Learning	Sep 29, 2025	CodeCode Available
A-MemGuard: A Proactive Defense Framework for LLM-Based Agent Memory	Sep 29, 2025	CodeCode Available
OIG-Bench: A Multi-Agent Annotated Benchmark for Multimodal One-Image Guides Understanding	Sep 29, 2025	CodeCode Available