The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 4976–5000 of 661570 papers

Title	Date	Tasks	Status	Hype
Self-Refining Video Sampling	Jan 26, 2026		—Unverified	2
DeFM: Learning Foundation Representations from Depth for Robotics	Jan 26, 2026		—Unverified	2
Learning Video Generation for Robotic Manipulation with Collaborative Trajectory Control	Jan 26, 2026		—Unverified	2
SenseNova-MARS: Empowering Multimodal Agentic Reasoning and Search via Reinforcement Learning	Jan 25, 2026		—Unverified	2
BuildArena: A Physics-Aligned Interactive Benchmark of LLMs for Engineering Construction	Jan 24, 2026		—Unverified	2
VisGym: Diverse, Customizable, Scalable Environments for Multimodal Agents	Jan 23, 2026		—Unverified	2
Q-learning with Adjoint Matching	Jan 23, 2026		—Unverified	2
Stable-DiffCoder: Pushing the Frontier of Code Diffusion Large Language Model	Jan 23, 2026		—Unverified	2
The Prism Hypothesis: Harmonizing Semantic and Pixel Representations via Unified Autoencoding	Jan 23, 2026		—Unverified	2
SciArena: An Open Evaluation Platform for Non-Verifiable Scientific Literature-Grounded Tasks	Jan 22, 2026		—Unverified	2
Boosting Generative Image Modeling via Joint Image-Feature Synthesis	Jan 22, 2026		—Unverified	2
GutenOCR: A Grounded Vision-Language Front-End for Documents	Jan 22, 2026		—Unverified	2
BPMN Assistant: An LLM-Based Approach to Business Process Modeling	Jan 22, 2026		—Unverified	2
Rethinking Video Generation Model for the Embodied World	Jan 21, 2026		—Unverified	2
Adaptive Multi-Agent Reasoning via Automated Workflow Generation	Jul 18, 2025		CodeCode Available	2
CharaConsist: Fine-Grained Consistent Character Generation	Jul 15, 2025	Consistent Character GenerationImage Generation	CodeCode Available	2
Alleviating Textual Reliance in Medical Language-guided Segmentation via Prototype-driven Semantic Approximation	Jul 15, 2025	Image SegmentationSegmentation	CodeCode Available	2
Seq vs Seq: An Open Suite of Paired Encoders and Decoders	Jul 15, 2025	DecoderLarge Language Model	CodeCode Available	2
DrafterBench: Benchmarking Large Language Models for Tasks Automation in Civil Engineering	Jul 15, 2025	BenchmarkingInstruction Following	CodeCode Available	2
SystolicAttention: Fusing FlashAttention within a Single Systolic Array	Jul 15, 2025	Scheduling	CodeCode Available	2
The Devil behind the mask: An emergent safety vulnerability of Diffusion LLMs	Jul 15, 2025	Code GenerationSafety Alignment	CodeCode Available	2
Vision Language Action Models in Robotic Manipulation: A Systematic Review	Jul 14, 2025	Dataset GenerationNatural Language Understanding	CodeCode Available	2
MGVQ: Could VQ-VAE Beat VAE? A Generalizable Tokenizer with Multi-group Quantization	Jul 14, 2025	2kImage Generation	CodeCode Available	2
CompassJudger-2: Towards Generalist Judge Model via Verifiable Rewards	Jul 12, 2025		CodeCode Available	2
I^2-World: Intra-Inter Tokenization for Efficient Dynamic 4D Scene Forecasting	Jul 12, 2025	Autonomous DrivingComputational Efficiency	CodeCode Available	2