The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 7976–8000 of 177340 papers

Title	Date	Tasks	Status	Hype	Score
MT-Bench-101: A Fine-Grained Benchmark for Evaluating Large Language Models in Multi-Turn Dialogues	Feb 22, 2024		CodeCode Available	2	5
PPFlow: Target-aware Peptide Design with Torsional Flow Matching	Mar 5, 2024	Drug DesignDrug Discovery	CodeCode Available	2	5
ARES: An Automated Evaluation Framework for Retrieval-Augmented Generation Systems	Nov 16, 2023	RAGRetrieval	CodeCode Available	2	5
Particle Video Revisited: Tracking Through Occlusions Using Point Trajectories	Apr 8, 2022	Motion EstimationObject Tracking	CodeCode Available	2	5
FreeTumor: Large-Scale Generative Tumor Synthesis in Computed Tomography Images for Improving Tumor Recognition	Feb 23, 2025	Computed Tomography (CT)	CodeCode Available	2	5
UMBRAE: Unified Multimodal Brain Decoding	Apr 10, 2024	Brain DecodingLanguage Modeling	CodeCode Available	2	5
HiP-AD: Hierarchical and Multi-Granularity Planning with Deformable Attention for Autonomous Driving in a Single Decoder	Mar 11, 2025	Autonomous DrivingBench2Drive	CodeCode Available	2	5
LexEval: A Comprehensive Chinese Legal Benchmark for Evaluating Large Language Models	Sep 30, 2024	Fairness	CodeCode Available	2	5
RankZephyr: Effective and Robust Zero-Shot Listwise Reranking is a Breeze!	Dec 5, 2023	Information RetrievalReranking	CodeCode Available	2	5
TensorNet: Cartesian Tensor Representations for Efficient Learning of Molecular Potentials	Jun 10, 2023	Formation Energy	CodeCode Available	2	5
Watch Every Step! LLM Agent Learning via Iterative Step-Level Process Refinement	Jun 17, 2024	Language ModelingLanguage Modelling	CodeCode Available	2	5
FastCuRL: Curriculum Reinforcement Learning with Progressive Context Extension for Efficient Training R1-like Reasoning Models	Mar 21, 2025	Language ModelingLanguage Modelling	CodeCode Available	2	5
Pix2Poly: A Sequence Prediction Method for End-to-end Polygonal Building Footprint Extraction from Remote Sensing Imagery	Dec 10, 2024	DecoderExtracting Buildings In Remote Sensing Images	CodeCode Available	2	5
Multi-View Mesh Reconstruction with Neural Deferred Shading	Dec 8, 2022	3D ReconstructionMulti-View 3D Reconstruction	CodeCode Available	2	5
Towards High-Quality and Efficient Speech Bandwidth Extension with Parallel Amplitude and Phase Prediction	Jan 12, 2024	Bandwidth ExtensionCPU	CodeCode Available	2	5
Room impulse response reconstruction with physics-informed deep learning	Jan 2, 2024	Deep Learning	CodeCode Available	2	5
Efficient4D: Fast Dynamic 3D Object Generation from a Single-view Video	Jan 16, 2024	Image GenerationImage to 3D	CodeCode Available	2	5
MathCoder2: Better Math Reasoning from Continued Pretraining on Model-translated Mathematical Code	Oct 10, 2024	MathMathematical Reasoning	CodeCode Available	2	5
ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate	Aug 14, 2023	Text Generation	CodeCode Available	2	5
Multi-Programming Language Sandbox for LLMs	Oct 30, 2024		CodeCode Available	2	5
OneRef: Unified One-tower Expression Grounding and Segmentation with Mask Referring Modeling	Oct 10, 2024	Language ModelingLanguage Modelling	CodeCode Available	2	5
ReSimAD: Zero-Shot 3D Domain Transfer for Autonomous Driving with Source Reconstruction and Target Simulation	Sep 11, 2023	Autonomous DrivingDomain Generalization	CodeCode Available	2	5
How Instruction and Reasoning Data shape Post-Training: Data Quality through the Lens of Layer-wise Gradients	Apr 14, 2025	Instruction Following	CodeCode Available	2	5
Tuning Large Multimodal Models for Videos using Reinforcement Learning from AI Feedback	Feb 6, 2024	Video-based Generative Performance Benchmarking	CodeCode Available	2	5
EarthGPT: A Universal Multi-modal Large Language Model for Multi-sensor Image Comprehension in Remote Sensing Domain	Jan 30, 2024	Image ComprehensionInstruction Following	CodeCode Available	2	5