The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 7076–7100 of 474278 papers

Title	Date	Status
ToC: Tree-of-Claims Search with Multi-Agent Language Models	Nov 21, 2025	CodeCode Available
Diversity Has Always Been There in Your Visual Autoregressive Models	Nov 21, 2025	CodeCode Available
Improving Multimodal Distillation for 3D Semantic Segmentation under Domain Shift	Nov 21, 2025	CodeCode Available
ARISE: Agentic Rubric-Guided Iterative Survey Engine for Automated Scholarly Paper Generation	Nov 21, 2025	CodeCode Available
ResearStudio: A Human-Intervenable Framework for Building Controllable Deep-Research Agents	Nov 21, 2025	CodeCode Available
T2I-RiskyPrompt: A Benchmark for Safety Evaluation, Attack, and Defense on Text-to-Image Model	Nov 21, 2025	CodeCode Available
Automated Interpretable 2D Video Extraction from 3D Echocardiography	Nov 21, 2025	CodeCode Available
Bridging Visual Affective Gap: Borrowing Textual Knowledge by Learning from Noisy Image-Text Pairs	Nov 21, 2025	CodeCode Available
Continual Alignment for SAM: Rethinking Foundation Models for Medical Image Segmentation in Continual Learning	Nov 21, 2025	CodeCode Available
Dual-domain Adaptation Networks for Realistic Image Super-resolution	Nov 21, 2025	CodeCode Available
BiFingerPose: Bimodal Finger Pose Estimation for Touch Devices	Nov 21, 2025	CodeCode Available
MMT-ARD: Multimodal Multi-Teacher Adversarial Distillation for Robust Vision-Language Models	Nov 21, 2025	CodeCode Available
Lost in Translation and Noise: A Deep Dive into the Failure Modes of VLMs on Real-World Tables	Nov 21, 2025	CodeCode Available
Goal-Directed Search Outperforms Goal-Agnostic Memory Compression in Long-Context Memory Tasks	Nov 20, 2025	CodeCode Available
TurkColBERT: A Benchmark of Dense and Late-Interaction Models for Turkish Information Retrieval	Nov 20, 2025	—Unverified
STAMP: Spatial-Temporal Adapter with Multi-Head Pooling	Nov 20, 2025	—Unverified
SAM2S: Segment Anything in Surgical Videos via Semantic Long-term Tracking	Nov 20, 2025	—Unverified
SAM 3: Segment Anything with Concepts	Nov 20, 2025	—Unverified
Fantastic Bugs and Where to Find Them in AI Benchmarks	Nov 20, 2025	—Unverified
V-ReasonBench: Toward Unified Reasoning Benchmark Suite for Video Generation Models	Nov 20, 2025	—Unverified
Probing the Critical Point (CritPt) of AI Reasoning: a Frontier Physics Research Benchmark	Nov 20, 2025	—Unverified
Beyond Human Judgment: A Bayesian Evaluation of LLMs' Moral Values Understanding	Nov 20, 2025	—Unverified
SAM 3D: 3Dfy Anything in Images	Nov 20, 2025	—Unverified
AutoBackdoor: Automating Backdoor Attacks via LLM Agents	Nov 20, 2025	CodeCode Available
gfnx: Fast and Scalable Library for Generative Flow Networks in JAX	Nov 20, 2025	CodeCode Available