The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 5151–5175 of 661570 papers

Title	Date	Tasks	Status	Hype
FunDiff: Diffusion Models over Function Spaces for Physics-Informed Generative Modeling	Jun 9, 2025	Density Estimation	CodeCode Available	2
Speedy Deformable 3D Gaussian Splatting: Fast Rendering and Compression of Dynamic Scenes	Jun 9, 2025	3DGSNeRF	CodeCode Available	2
Thinking vs. Doing: Agents that Reason by Scaling Test-Time Interaction	Jun 9, 2025	Reinforcement Learning (RL)	CodeCode Available	2
OneIG-Bench: Omni-dimensional Nuanced Evaluation for Image Generation	Jun 9, 2025	Image Generation	CodeCode Available	2
BitVLA: 1-bit Vision-Language-Action Models for Robotics Manipulation	Jun 9, 2025	QuantizationVision-Language-Action	CodeCode Available	2
HeuriGym: An Agentic Benchmark for LLM-Crafted Heuristics in Combinatorial Optimization	Jun 9, 2025	Combinatorial OptimizationMemorization	CodeCode Available	2
Overclocking LLM Reasoning: Monitoring and Controlling Thinking Path Lengths in LLMs	Jun 8, 2025		CodeCode Available	2
Audio synthesizer inversion in symmetric parameter spaces with approximately equivariant flow matching	Jun 8, 2025		CodeCode Available	2
Generating Long Semantic IDs in Parallel for Recommendation	Jun 6, 2025		CodeCode Available	2
RecGPT: A Foundation Model for Sequential Recommendation	Jun 6, 2025	Decodermodel	CodeCode Available	2
MegaHan97K: A Large-Scale Dataset for Mega-Category Chinese Character Recognition with over 97K Categories	Jun 5, 2025	BenchmarkingOptical Character Recognition	CodeCode Available	2
VideoMolmo: Spatio-Temporal Grounding Meets Pointing	Jun 5, 2025	Autonomous DrivingAutonomous Navigation	CodeCode Available	2
Perceive Anything: Recognize, Explain, Caption, and Segment Anything in Images and Videos	Jun 5, 2025	GPUSemantic Segmentation	CodeCode Available	2
MINT-CoT: Enabling Interleaved Visual Tokens in Mathematical Chain-of-Thought Reasoning	Jun 5, 2025	MathMathematical Reasoning	CodeCode Available	2
Kinetics: Rethinking Test-Time Scaling Laws	Jun 5, 2025		CodeCode Available	2
SparseMM: Head Sparsity Emerges from Visual Concept Responses in MLLMs	Jun 5, 2025		CodeCode Available	2
Unifying Appearance Codes and Bilateral Grids for Driving Scene Gaussian Splatting	Jun 5, 2025	Autonomous DrivingNeRF	CodeCode Available	2
Exploring Diffusion Transformer Designs via Grafting	Jun 5, 2025		CodeCode Available	2
AliTok: Towards Sequence Modeling Alignment between Tokenizer and Autoregressive Model	Jun 5, 2025	DecoderImage Generation	CodeCode Available	2
A Smooth Sea Never Made a Skilled SAILOR: Robust Imitation via Learning to Search	Jun 5, 2025	Imitation Learning	CodeCode Available	2
EMBER2024 -- A Benchmark Dataset for Holistic Evaluation of Malware Classifiers	Jun 5, 2025	Malware AnalysisMalware Classification	CodeCode Available	2
Scaling Laws for Robust Comparison of Open Foundation Language-Vision Models and Datasets	Jun 5, 2025		CodeCode Available	2
Search Arena: Analyzing Search-Augmented LLMs	Jun 5, 2025	Fact Checking	CodeCode Available	2
Contrastive Flow Matching	Jun 5, 2025		CodeCode Available	2
LeanExplore: A search engine for Lean 4 declarations	Jun 4, 2025	Automated Theorem Proving	CodeCode Available	2