The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1826–1850 of 661570 papers

Title	Date	Status
Separating Diagnosis from Control: Auditable Policy Adaptation in Agent-Based Simulations with LLM-Based Diagnostics	Mar 24, 2026	—Unverified
EchoKV: Efficient KV Cache Compression via Similarity-Based Reconstruction	Mar 24, 2026	—Unverified
Ran Score: a LLM-based Evaluation Score for Radiology Report Generation	Mar 24, 2026	—Unverified
FixationFormer: Direct Utilization of Expert Gaze Trajectories for Chest X-Ray Classification	Mar 24, 2026	—Unverified
Algorithmic warm starts for Hamiltonian Monte Carlo	Mar 24, 2026	—Unverified
Beyond Binary Correctness: Scaling Evaluation of Long-Horizon Agents on Subjective Enterprise Tasks	Mar 24, 2026	—Unverified
REALITrees: Rashomon Ensemble Active Learning for Interpretable Trees	Mar 24, 2026	—Unverified
CLiGNet: Clinical Label-Interaction Graph Network for Medical Specialty Classification from Clinical Transcriptions	Mar 24, 2026	—Unverified
PRISM: A Dual View of LLM Reasoning through Semantic Flow and Latent Computation	Mar 24, 2026	—Unverified
KALAVAI: Predicting When Independent Specialist Fusion Works -- A Quantitative Model for Post-Hoc Cooperative LLM Training	Mar 24, 2026	—Unverified
MVPBench: A Multi-Video Perception Evaluation Benchmark for Multi-Modal Video Understanding	Mar 24, 2026	—Unverified
Multimodal Industrial Anomaly Detection via Geometric Prior	Mar 24, 2026	—Unverified
Predictive Photometric Uncertainty in Gaussian Splatting for Novel View Synthesis	Mar 24, 2026	—Unverified
Quantum Random Forest for the Regression Problem	Mar 24, 2026	—Unverified
ABSTRAL: Automatic Design of Multi-Agent Systems Through Iterative Refinement and Topology Optimization	Mar 24, 2026	—Unverified
Optimizing Small Language Models for NL2SQL via Chain-of-Thought Fine-Tuning	Mar 24, 2026	—Unverified
Transformers Trained via Gradient Descent Can Provably Learn a Class of Teacher Models	Mar 24, 2026	—Unverified
When AI Shows Its Work, Is It Actually Working? Step-Level Evaluation Reveals Frontier Language Models Frequently Bypass Their Own Reasoning	Mar 24, 2026	—Unverified
TDATR: Improving End-to-End Table Recognition via Table Detail-Aware Learning and Cell-Level Visual Alignment	Mar 24, 2026	—Unverified
RadTimeline: Timeline Summarization for Longitudinal Radiological Lung Findings	Mar 24, 2026	—Unverified
PersonalQ: Select, Quantize, and Serve Personalized Diffusion Models for Efficient Inference	Mar 24, 2026	—Unverified
A PAC-Bayesian approach to generalization for quantum models	Mar 24, 2026	—Unverified
Zero-Shot Personalization of Objects via Textual Inversion	Mar 24, 2026	—Unverified
Knowledge Access Beats Model Size: Memory Augmented Routing for Persistent AI Agents	Mar 24, 2026	—Unverified
A Sobering Look at Tabular Data Generation via Probabilistic Circuits	Mar 24, 2026	—Unverified