The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2476–2500 of 661570 papers

Title	Date	Tasks	Status	Hype
Distilling LLM Agent into Small Models with Retrieval and Code Tools	May 23, 2025	Action GenerationDomain Generalization	CodeCode Available	3
Token Reduction Should Go Beyond Efficiency in Generative Models -- From Vision, Language to Multimodality	May 23, 2025	In-Context LearningToken Reduction	CodeCode Available	3
CLIMB: Class-imbalanced Learning Benchmark on Tabular Data	May 23, 2025		CodeCode Available	3
MASLab: A Unified and Comprehensive Codebase for LLM-based Multi-Agent Systems	May 22, 2025		CodeCode Available	3
AudioTrust: Benchmarking the Multifaceted Trustworthiness of Audio Large Language Models	May 22, 2025	BenchmarkingFairness	CodeCode Available	3
Tool-Star: Empowering LLM-Brained Multi-Tool Reasoner via Reinforcement Learning	May 22, 2025	Reinforcement Learning (RL)	CodeCode Available	3
R1-ShareVL: Incentivizing Reasoning Capability of Multimodal Large Language Models via Share-GRPO	May 22, 2025	Reinforcement Learning (RL)	CodeCode Available	3
LaViDa: A Large Diffusion Language Model for Multimodal Understanding	May 22, 2025	Instruction FollowingLanguage Modeling	CodeCode Available	3
Arctic-Text2SQL-R1: Simple Rewards, Strong Reasoning in Text-to-SQL	May 22, 2025	Natural Language UnderstandingReinforcement Learning (RL)	CodeCode Available	3
Reasoning Beyond Language: A Comprehensive Survey on Latent Chain-of-Thought Reasoning	May 22, 2025		CodeCode Available	3
IFEval-Audio: Benchmarking Instruction-Following Capability in Audio-based Large Language Models	May 22, 2025	BenchmarkingInstruction Following	CodeCode Available	3
Training-Free Efficient Video Generation via Dynamic Token Carving	May 22, 2025	DenoisingVideo Generation	CodeCode Available	3
Soft Thinking: Unlocking the Reasoning Potential of LLMs in Continuous Concept Space	May 21, 2025		CodeCode Available	3
Distance Adaptive Beam Search for Provably Accurate Graph-Based Nearest Neighbor Search	May 21, 2025	Information Retrieval	CodeCode Available	3
Efficient Agent Training for Computer Use	May 20, 2025		CodeCode Available	3
OmniGenBench: A Modular Platform for Reproducible Genomic Foundation Models Benchmarking	May 20, 2025	Benchmarking	CodeCode Available	3
General-Reasoner: Advancing LLM Reasoning Across All Domains	May 20, 2025	AllMath	CodeCode Available	3
RLVR-World: Training World Models with Reinforcement Learning	May 20, 2025	reinforcement-learningReinforcement Learning	CodeCode Available	3
MLZero: A Multi-Agent System for End-to-end Machine Learning Automation	May 20, 2025	AutoMLCode Generation	CodeCode Available	3
MM-Agent: LLM as Agents for Real-world Mathematical Modeling Problem	May 20, 2025	Mathematical Reasoningscientific discovery	CodeCode Available	3
This Time is Different: An Observability Perspective on Time Series Foundation Models	May 20, 2025	DecoderMultivariate Time Series Forecasting	CodeCode Available	3
From Automation to Autonomy: A Survey on Large Language Models in Scientific Discovery	May 19, 2025	Navigatescientific discovery	CodeCode Available	3
Thinkless: LLM Learns When to Think	May 19, 2025	GSM8KMath	CodeCode Available	3
ExTrans: Multilingual Deep Reasoning Translation via Exemplar-Enhanced Reinforcement Learning	May 19, 2025	Machine Translationreinforcement-learning	CodeCode Available	3
Harnessing the Universal Geometry of Embeddings	May 18, 2025	Attribute	CodeCode Available	3