The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 8701–8725 of 474278 papers

Title	Date	Status
MorphoBench: A Benchmark with Difficulty Adaptive to Model Reasoning	Oct 16, 2025	CodeCode Available
Flip-Flop Consistency: Unsupervised Training for Robustness to Prompt Perturbations in LLMs	Oct 16, 2025	CodeCode Available
Reinforcement Learning for Unsupervised Domain Adaptation in Spatio-Temporal Echocardiography Segmentation	Oct 16, 2025	CodeCode Available
Pruning Overparameterized Multi-Task Networks for Degraded Web Image Restoration	Oct 16, 2025	CodeCode Available
EuroMineNet: A Multitemporal Sentinel-2 Benchmark for Spatiotemporal Mining Footprint Analysis in the European Union (2015-2024)	Oct 16, 2025	CodeCode Available
Where are the Whales: A Human-in-the-loop Detection Method for Identifying Whales in High-resolution Satellite Imagery	Oct 16, 2025	CodeCode Available
BioMedSearch: A Multi-Source Biomedical Retrieval Framework Based on LLMs	Oct 15, 2025	CodeCode Available
Text Anomaly Detection with Simplified Isolation Kernel	Oct 15, 2025	CodeCode Available
ERGO: Entropy-guided Resetting for Generation Optimization in Multi-turn Language Models	Oct 15, 2025	—Unverified
LLMs Can Get "Brain Rot"!	Oct 15, 2025	—Unverified
Putting on the Thinking Hats: A Survey on Chain of Thought Fine-tuning from the Perspective of Human Reasoning Mechanism	Oct 15, 2025	CodeCode Available
VR-Thinker: Boosting Video Reward Models through Thinking-with-Image Reasoning	Oct 15, 2025	—Unverified
LLM-guided Hierarchical Retrieval	Oct 15, 2025	—Unverified
Two Heads Are Better Than One: Audio-Visual Speech Error Correction with Dual Hypotheses	Oct 15, 2025	CodeCode Available
Unlocking Out-of-Distribution Generalization in Transformers via Recursive Latent Space Reasoning	Oct 15, 2025	—Unverified
TempFlow-GRPO: When Timing Matters for GRPO in Flow Models	Oct 15, 2025	—Unverified
FlashAdventure: A Benchmark for GUI Agents Solving Full Story Arcs in Diverse Adventure Games	Oct 15, 2025	—Unverified
CE-GPPO: Coordinating Entropy via Gradient-Preserving Clipping Policy Optimization in Reinforcement Learning	Oct 15, 2025	—Unverified
A Tale of LLMs and Induced Small Proxies: Scalable Agents for Knowledge Mining	Oct 15, 2025	—Unverified
AutoPR: Let's Automate Your Academic Promotion!	Oct 15, 2025	—Unverified
VLA-0: Building State-of-the-Art VLAs with Zero Modification	Oct 15, 2025	—Unverified
Complementary Information Guided Occupancy Prediction via Multi-Level Representation Fusion	Oct 15, 2025	CodeCode Available
Hard2Verify: A Step-Level Verification Benchmark for Open-Ended Frontier Math	Oct 15, 2025	—Unverified
FlashWorld: High-quality 3D Scene Generation within Seconds	Oct 15, 2025	—Unverified
Trace Anything: Representing Any Video in 4D via Trajectory Fields	Oct 15, 2025	—Unverified