The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2451–2475 of 661570 papers

Title	Date	Tasks	Status	Hype
SealQA: Raising the Bar for Reasoning in Search-Augmented Language Models	Jun 1, 2025		CodeCode Available	3
EXP-Bench: Can AI Conduct AI Research Experiments?	May 30, 2025		CodeCode Available	3
MathArena: Evaluating LLMs on Uncontaminated Math Competitions	May 29, 2025	MathMathematical Reasoning	CodeCode Available	3
TiRex: Zero-Shot Forecasting Across Long and Short Horizons with Enhanced In-Context Learning	May 29, 2025	In-Context LearningState Space Models	CodeCode Available	3
BioReason: Incentivizing Multimodal Biological Reasoning within a DNA-LLM Model	May 29, 2025	Large Language Modelscientific discovery	CodeCode Available	3
MAGREF: Masked Guidance for Any-Reference Video Generation	May 29, 2025	Human-Domain Subject-to-VideoOpen-Domain Subject-to-Video	CodeCode Available	3
EmergentTTS-Eval: Evaluating TTS Models on Complex Prosodic, Expressiveness, and Linguistic Challenges Using Model-as-a-Judge	May 29, 2025	text-to-speechText to Speech	CodeCode Available	3
KVzip: Query-Agnostic KV Cache Compression with Context Reconstruction	May 29, 2025	Question Answering	CodeCode Available	3
Impromptu VLA: Open Weights and Open Data for Driving Vision-Language-Action Models	May 29, 2025	Autonomous DrivingDiagnostic	CodeCode Available	3
VRAG-RL: Empower Vision-Perception-Based RAG for Visually Rich Information Understanding via Iterative Reasoning with Reinforcement Learning	May 28, 2025	RAG	CodeCode Available	3
NeuralOM: Neural Ocean Model for Subseasonal-to-Seasonal Simulation	May 27, 2025	Computational EfficiencyGraph Neural Network	CodeCode Available	3
syftr: Pareto-Optimal Generative AI	May 26, 2025	Bayesian OptimizationRAG	CodeCode Available	3
Iterative Self-Incentivization Empowers Large Language Models as Agentic Searchers	May 26, 2025	Information Retrieval	CodeCode Available	3
Learning to Reason without External Rewards	May 26, 2025	Code Generationreinforcement-learning	CodeCode Available	3
PCDCNet: A Surrogate Model for Air Quality Forecasting with Physical-Chemical Dynamics and Constraints	May 26, 2025	Deep Learning	CodeCode Available	3
VoiceStar: Robust Zero-Shot Autoregressive TTS with Duration Control and Extrapolation	May 26, 2025	DecoderLanguage Modeling	CodeCode Available	3
VLM-3R: Vision-Language Models Augmented with Instruction-Aligned 3D Reconstruction	May 26, 2025	3D ReconstructionSpatial Reasoning	CodeCode Available	3
FruitNeRF++: A Generalized Multi-Fruit Counting Method Utilizing Contrastive Learning and Neural Radiance Fields	May 26, 2025	Contrastive Learning	CodeCode Available	3
SoloSpeech: Enhancing Intelligibility and Quality in Target Speech Extraction through a Cascaded Generative Pipeline	May 25, 2025	Speech ExtractionSpeech Separation	CodeCode Available	3
InfoChartQA: A Benchmark for Multimodal Question Answering on Infographic Charts	May 25, 2025	Chart UnderstandingQuestion Answering	CodeCode Available	3
OmniConsistency: Learning Style-Agnostic Consistency from Paired Stylization Data	May 24, 2025	Image Stylization	CodeCode Available	3
VLA-RL: Towards Masterful and General Robotic Manipulation with Scalable Reinforcement Learning	May 24, 2025	GPUReinforcement Learning (RL)	CodeCode Available	3
ChartGalaxy: A Dataset for Infographic Chart Understanding and Generation	May 24, 2025	BenchmarkingChart Understanding	CodeCode Available	3
Distilling LLM Agent into Small Models with Retrieval and Code Tools	May 23, 2025	Action GenerationDomain Generalization	CodeCode Available	3
OrionBench: A Benchmark for Chart and Human-Recognizable Object Detection in Infographics	May 23, 2025	Chart Understandingobject-detection	CodeCode Available	3