The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2601–2625 of 661570 papers

Title	Date	Tasks	Status	Hype
VerMCTS: Synthesizing Multi-Step Programs using a Verifier, a Large Language Model, and Tree Search	Feb 13, 2024	Language ModelingLanguage Modelling	CodeCode Available	3
STORM: Spatio-Temporal Reconstruction Model for Large-Scale Outdoor Scenes	Dec 31, 2024	Dynamic ReconstructionScene Flow Estimation	CodeCode Available	3
Differentiable Data Augmentation with Kornia	Nov 19, 2020	Image AugmentationImage Manipulation	CodeCode Available	3
Supplementary Material for Efficient and Robust Automated Machine Learning	Jan 1, 2015	BIG-bench Machine LearningHyperparameter Optimization	CodeCode Available	3
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks	Apr 16, 2022	BenchmarkingInstruction Following	CodeCode Available	3
Why Do Multi-Agent LLM Systems Fail?	Mar 17, 2025		CodeCode Available	3
SceneSplat: Gaussian Splatting-based Scene Understanding with Vision-Language Pretraining	Mar 23, 2025	3DGSBenchmarking	CodeCode Available	3
Token Merging: Your ViT But Faster	Oct 17, 2022	Efficient ViTs	CodeCode Available	3
StableVideo: Text-driven Consistency-aware Diffusion Video Editing	Aug 18, 2023	Video Editing	CodeCode Available	3
Data-centric AI: Perspectives and Challenges	Jan 12, 2023		CodeCode Available	3
Declarative Machine Learning Systems	Jul 16, 2021	BIG-bench Machine Learning	CodeCode Available	3
Text2Room: Extracting Textured 3D Meshes from 2D Text-to-Image Models	Mar 21, 2023	3D geometryText to 3D	CodeCode Available	3
BundleSDF: Neural 6-DoF Tracking and 3D Reconstruction of Unknown Objects	Mar 24, 2023	3D Object Detection3D Object Tracking	CodeCode Available	3
TorchBench: Benchmarking PyTorch with High API Surface Coverage	Apr 27, 2023	BenchmarkingGPU	CodeCode Available	3
How Can Recommender Systems Benefit from Large Language Models: A Survey	Jun 9, 2023	EthicsFeature Engineering	CodeCode Available	3
Scaffold-GS: Structured 3D Gaussians for View-Adaptive Rendering	Nov 30, 2023	Neural Rendering	CodeCode Available	3
DeFlow: Decoder of Scene Flow Network in Autonomous Driving	Jan 29, 2024	Autonomous DrivingDecoder	CodeCode Available	3
Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models	Feb 10, 2024	CPUGPU	CodeCode Available	3
FaceXFormer: A Unified Transformer for Facial Analysis	Mar 19, 2024	Age and Gender EstimationAge Estimation	CodeCode Available	3
Benchmarking Large Language Models on CFLUE -- A Chinese Financial Language Understanding Evaluation Dataset	May 17, 2024	16kBenchmarking	CodeCode Available	3
Vaporetto: Efficient Japanese Tokenization Based on Improved Pointwise Linear Classification	Jun 24, 2024		CodeCode Available	3
HARDVS: Revisiting Human Activity Recognition with Dynamic Vision Sensors	Nov 17, 2022	Activity PredictionActivity Recognition	CodeCode Available	3
A Note on the Prediction-Powered Bootstrap	May 28, 2024	Prediction	CodeCode Available	3
S-Graphs 2.0 -- A Hierarchical-Semantic Optimization and Loop Closure for SLAM	Feb 25, 2025	global-optimizationManagement	CodeCode Available	3
AudioBench: A Universal Benchmark for Audio Large Language Models	Jun 23, 2024	Audio Scene UnderstandingInstruction Following	CodeCode Available	3