The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 7251–7275 of 474278 papers

Title	Date	Status
MIMIC-4-Ext-22MCTS: A 22 Millions-Event Temporal Clinical Time-Series Dataset with Relative Timestamp for Risk Prediction	Nov 17, 2025	CodeCode Available
Part-X-MLLM: Part-aware 3D Multimodal Large Language Model	Nov 17, 2025	—Unverified
Fine-grained Image Quality Assessment for Perceptual Image Restoration	Nov 17, 2025	—Unverified
VocalBench-zh: Decomposing and Benchmarking the Speech Conversational Abilities in Mandarin Context	Nov 17, 2025	CodeCode Available
WebCoach: Self-Evolving Web Agents with Cross-Session Memory Guidance	Nov 17, 2025	—Unverified
Are Graph Transformers Necessary? Efficient Long-Range Message Passing with Fractal Nodes in MPNNs	Nov 17, 2025	—Unverified
One-Step Generative Policies with Q-Learning: A Reformulation of MeanFlow	Nov 17, 2025	CodeCode Available
TokenSqueeze: Performance-Preserving Compression for Reasoning LLMs	Nov 17, 2025	CodeCode Available
Souper-Model: How Simple Arithmetic Unlocks State-of-the-Art LLM Performance	Nov 17, 2025	—Unverified
PhysX-Anything: Simulation-Ready Physical 3D Assets from Single Image	Nov 17, 2025	—Unverified
UnSAMv2: Self-Supervised Learning Enables Segment Anything at Any Granularity	Nov 17, 2025	—Unverified
Segment Anything Across Shots: A Method and Benchmark	Nov 17, 2025	—Unverified
Uncertainty-Calibrated Prediction of Randomly-Timed Biomarker Trajectories with Conformal Bands	Nov 17, 2025	CodeCode Available
Start Small, Think Big: Curriculum-based Relative Policy Optimization for Visual Grounding	Nov 17, 2025	CodeCode Available
LoCoBench-Agent: An Interactive Benchmark for LLM Agents in Long-Context Software Engineering	Nov 17, 2025	—Unverified
CAMAR: Continuous Actions Multi-Agent Routing	Nov 17, 2025	—Unverified
SoK: Large Language Model Copyright Auditing via Fingerprinting	Nov 17, 2025	CodeCode Available
Large Language Models Meet Extreme Multi-label Classification: Scaling and Multi-modal Framework	Nov 17, 2025	CodeCode Available
Efficient SAR Vessel Detection for FPGA-Based On-Satellite Sensing	Nov 17, 2025	—Unverified
Enhancing All-to-X Backdoor Attacks with Optimized Target Class Mapping	Nov 17, 2025	CodeCode Available
P1: Mastering Physics Olympiads with Reinforcement Learning	Nov 17, 2025	—Unverified
ThinkingViT: Matryoshka Thinking Vision Transformer for Elastic Inference	Nov 17, 2025	CodeCode Available
SafeGRPO: Self-Rewarded Multimodal Safety Alignment via Rule-Governed Policy Optimization	Nov 17, 2025	—Unverified
Temporal Realism Evaluation of Generated Videos Using Compressed-Domain Motion Vectors	Nov 17, 2025	CodeCode Available
Towards Methane Detection Onboard Satellites	Nov 17, 2025	CodeCode Available