The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2626–2650 of 177340 papers

Title	Date	Tasks	Status	Hype	Score
BayLing 2: A Multilingual Large Language Model with Efficient Language Alignment	Nov 25, 2024	Language ModelingLanguage Modelling	CodeCode Available	3	5
RSMamba: Remote Sensing Image Classification with State Space Model	Mar 28, 2024	Classificationimage-classification	CodeCode Available	3	5
Cybench: A Framework for Evaluating Cybersecurity Capabilities and Risks of Language Models	Aug 15, 2024		CodeCode Available	3	5
Proteina: Scaling Flow-based Protein Structure Generative Models	Mar 2, 2025	Protein Design	CodeCode Available	3	5
A Review of Prominent Paradigms for LLM-Based Agents: Tool Use (Including RAG), Planning, and Feedback Learning	Jun 9, 2024	Language ModelingLanguage Modelling	CodeCode Available	3	5
AbdomenAtlas: A Large-Scale, Detailed-Annotated, & Multi-Center Dataset for Efficient Transfer Learning and Open Algorithmic Benchmarking	Jul 23, 2024	BenchmarkingTransfer Learning	CodeCode Available	3	5
Self-rewarding correction for mathematical reasoning	Feb 26, 2025	Mathematical Reasoning	CodeCode Available	3	5
Moving Object Segmentation: All You Need Is SAM (and Flow)	Apr 18, 2024	AllMotion Segmentation	CodeCode Available	3	5
MDCrow: Automating Molecular Dynamics Workflows with Large Language Models	Feb 13, 2025		CodeCode Available	3	5
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations	Jun 20, 2020	QuantizationSelf-Supervised Learning	CodeCode Available	3	5
Prompt-to-Leaderboard	Feb 20, 2025	ChatbotLanguage Modeling	CodeCode Available	3	5
GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation	Apr 11, 2025	DecoderImage Generation	CodeCode Available	3	5
GameBench: Evaluating Strategic Reasoning Abilities of LLM Agents	Jun 7, 2024	Natural Language Understanding	CodeCode Available	3	5
PubMed 200k RCT: a Dataset for Sequential Sentence Classification in Medical Abstracts	Oct 17, 2017	General ClassificationSentence	CodeCode Available	3	5
BigGait: Learning Gait Representation You Want by Large Vision Models	Feb 29, 2024	Gait Recognition	CodeCode Available	3	5
Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based Agents	Oct 3, 2024	Autonomous DrivingBackdoor Attack	CodeCode Available	3	5
From Reusing to Forecasting: Accelerating Diffusion Models with TaylorSeers	Mar 10, 2025		CodeCode Available	3	5
MuChoMusic: Evaluating Music Understanding in Multimodal Audio-Language Models	Aug 2, 2024	Multimodal ReasoningMultiple-choice	CodeCode Available	3	5
Leveraging Biomolecule and Natural Language through Multi-Modal Learning: A Survey	Mar 3, 2024	Property Prediction	CodeCode Available	3	5
Vid2Avatar: 3D Avatar Reconstruction from Videos in the Wild via Self-supervised Scene Decomposition	Feb 22, 2023	3D Human Reconstructionglobal-optimization	CodeCode Available	3	5
OrionBench: A Benchmark for Chart and Human-Recognizable Object Detection in Infographics	May 23, 2025	Chart Understandingobject-detection	CodeCode Available	3	5
nnInteractive: Redefining 3D Promptable Segmentation	Mar 11, 2025	BenchmarkingInteractive Segmentation	CodeCode Available	3	5
3DIS: Depth-Driven Decoupled Instance Synthesis for Text-to-Image Generation	Oct 16, 2024	AttributeImage Generation	CodeCode Available	3	5
Kandinsky 3: Text-to-Image Synthesis for Multifunctional Generative Framework	Oct 28, 2024	Image GenerationImage Manipulation	CodeCode Available	3	5
Ai2 Scholar QA: Organized Literature Synthesis with Attribution	Apr 15, 2025	Question AnsweringRetrieval	CodeCode Available	3	5