The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2001–2025 of 661570 papers

Title	Date	Tasks	Status	Hype
TerraTorch: The Geospatial Foundation Models Toolkit	Mar 26, 2025	BenchmarkingDecoder	CodeCode Available	4
Video-R1: Reinforcing Video Reasoning in MLLMs	Mar 27, 2025	MVBenchReinforcement Learning (RL)	CodeCode Available	4
SongBloom: Coherent Song Generation via Interleaved Autoregressive Sketching and Diffusion Refinement	Jun 9, 2025	Music Generation	CodeCode Available	4
SpatialTrackerV2: 3D Point Tracking Made Easy	Jul 16, 2025	3D ReconstructionCamera Pose Estimation	CodeCode Available	4
Proactive Detection of Voice Cloning with Localized Watermarking	Jan 30, 2024	Voice Cloning	CodeCode Available	4
Eliciting Latent Predictions from Transformers with the Tuned Lens	Mar 14, 2023	Language Modelling	CodeCode Available	4
REFINE: Inversion-Free Backdoor Defense via Model Reprogramming	Feb 22, 2025	backdoor defense	CodeCode Available	4
Relationships are Complicated! An Analysis of Relationships Between Datasets on the Web	Aug 26, 2024	Decision MakingMulti-class Classification	CodeCode Available	4
Benchmarking Graphormer on Large-Scale Molecular Modeling Datasets	Mar 9, 2022	BenchmarkingGraph Regression	CodeCode Available	4
LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment	Oct 3, 2023	Audio ClassificationContrastive Learning	CodeCode Available	4
SWE-Search: Enhancing Software Agents with Monte Carlo Tree Search and Iterative Refinement	Oct 26, 2024	Large Language Model	CodeCode Available	4
R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization	Mar 13, 2025	Multimodal Reasoning	CodeCode Available	4
Recurrent Partial Kernel Network for Efficient Optical Flow Estimation	Feb 1, 2024	Optical Flow Estimation	CodeCode Available	4
DeXtreme: Transfer of Agile In-hand Manipulation from Simulation to Reality	Oct 25, 2022	Deep Reinforcement LearningGPU	CodeCode Available	4
Fin-R1: A Large Language Model for Financial Reasoning through Reinforcement Learning	Mar 20, 2025	Decision MakingLanguage Modeling	CodeCode Available	4
Are Transformers Effective for Time Series Forecasting?	May 26, 2022	Anomaly DetectionRelation Extraction	CodeCode Available	4
Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models	May 8, 2025	Multimodal Reasoning	CodeCode Available	4
Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation	Dec 4, 2023	Depth EstimationGPU	CodeCode Available	4
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot	Jan 2, 2023	Common Sense ReasoningLanguage Modelling	CodeCode Available	4
AlignScore: Evaluating Factual Consistency with a Unified Alignment Function	May 26, 2023	Fact VerificationInformation Retrieval	CodeCode Available	4
TRAM: Global Trajectory and Motion of 3D Humans from in-the-wild Videos	Mar 26, 2024	3D Human Pose Estimation	CodeCode Available	4
TableGPT2: A Large Multimodal Model with Tabular Data Integration	Nov 4, 2024	BenchmarkingData Integration	CodeCode Available	4
Human-Humanoid Robots Cross-Embodiment Behavior-Skill Transfer Using Decomposed Adversarial Learning from Demonstration	Dec 19, 2024	Human-Object Interaction Detectionmotion retargeting	CodeCode Available	4
FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds	Jul 1, 2024	Audio GenerationVideo Alignment	CodeCode Available	4
MovieChat+: Question-aware Sparse Memory for Long Video Question Answering	Apr 26, 2024	2kQuestion Answering	CodeCode Available	4