The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 5401–5425 of 661570 papers

Title	Date	Tasks	Status	Hype
A Tutorial on Structural Identifiability of Epidemic Models Using StructuralIdentifiability.jl	May 15, 2025	parameter estimation	CodeCode Available	2
WorldPM: Scaling Human Preference Modeling	May 15, 2025	Language ModelingLanguage Modelling	CodeCode Available	2
MASS: Multi-Agent Simulation Scaling for Portfolio Construction	May 15, 2025		CodeCode Available	2
MMLongBench: Benchmarking Long-Context Vision-Language Models Effectively and Thoroughly	May 15, 2025	8kBenchmarking	CodeCode Available	2
MMRL++: Parameter-Efficient and Interaction-Aware Representation Learning for Vision-Language Models	May 15, 2025	General KnowledgePrompt Engineering	CodeCode Available	2
Learning to Detect Multi-class Anomalies with Just One Normal Image Prompt	May 14, 2025	Anomaly DetectionAnomaly Segmentation	CodeCode Available	2
Recent Advances in Medical Imaging Segmentation: A Survey	May 14, 2025	Domain AdaptationFew-Shot Learning	CodeCode Available	2
WavReward: Spoken Dialogue Models With Generalist Reward Evaluators	May 14, 2025	Spoken Dialogue Systems	CodeCode Available	2
Reproducibility Study of "Cooperate or Collapse: Emergence of Sustainable Cooperation in a Society of LLM Agents"	May 14, 2025		CodeCode Available	2
Few-Shot Anomaly-Driven Generation for Anomaly Classification and Segmentation	May 14, 2025	Anomaly ClassificationAnomaly Detection	CodeCode Available	2
MetaUAS: Universal Anomaly Segmentation with One-Prompt Meta-Learning	May 14, 2025	Anomaly DetectionAnomaly Segmentation	CodeCode Available	2
Behind Maya: Building a Multilingual Vision Language Model	May 13, 2025	Language ModelingLanguage Modelling	CodeCode Available	2
CodePDE: An Inference Framework for LLM-driven PDE Solver Generation	May 13, 2025	Code Generation	CodeCode Available	2
Large Language Model Psychometrics: A Systematic Review of Evaluation, Validation, and Enhancement	May 13, 2025	BenchmarkingLanguage Modeling	CodeCode Available	2
BAT: Benchmark for Auto-bidding Task	May 13, 2025		CodeCode Available	2
Reinforced Internal-External Knowledge Synergistic Reasoning for Efficient Adaptive Search Agent	May 12, 2025	RAGReinforcement Learning (RL)	CodeCode Available	2
DynamicRAG: Leveraging Outputs of Large Language Model as Feedback for Dynamic Reranking in Retrieval-Augmented Generation	May 12, 2025	Language ModelingLanguage Modelling	CodeCode Available	2
YuLan-OneSim: Towards the Next Generation of Social Simulator with Large Language Models	May 12, 2025	Large Language ModelSociology	CodeCode Available	2
Agent RL Scaling Law: Agent RL with Spontaneous Code Execution for Mathematical Problem Solving	May 12, 2025	MathMathematical Problem-Solving	CodeCode Available	2
Boosting Global-Local Feature Matching via Anomaly Synthesis for Multi-Class Point Cloud Anomaly Detection	May 12, 2025	Anomaly Detection	CodeCode Available	2
SAS-Bench: A Fine-Grained Benchmark for Evaluating Short Answer Scoring with Large Language Models	May 12, 2025		CodeCode Available	2
MLE-Dojo: Interactive Environments for Empowering LLM Agents in Machine Learning Engineering	May 12, 2025	Large Language Modelreinforcement-learning	CodeCode Available	2
Structural Entropy Guided Agent for Detecting and Repairing Knowledge Deficiencies in LLMs	May 12, 2025	AI AgentKnowledge Distillation	CodeCode Available	2
Piloting Structure-Based Drug Design via Modality-Specific Optimal Schedule	May 12, 2025	Drug DesignScheduling	CodeCode Available	2
Unified Continuous Generative Models	May 12, 2025	Image Generation	CodeCode Available	2