The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

659,983 papers248,104 code links4,818 tasks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 876–900 of 659983 papers

Title	Date	Tasks	Status	Hype
A Survey of Text-to-SQL in the Era of LLMs: Where are we, and where are we going?	Aug 9, 2024	Natural Language QueriesText to SQL	CodeCode Available	5
SAM2-Adapter: Evaluating & Adapting Segment Anything 2 in Downstream Tasks: Camouflage, Shadow, Medical Image Segmentation, and More	Aug 8, 2024	Image SegmentationMedical Image Segmentation	CodeCode Available	5
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters	Aug 6, 2024		CodeCode Available	5
Mini-Monkey: Alleviating the Semantic Sawtooth Effect for Lightweight MLLMs via Complementary Image Pyramid	Aug 4, 2024	document understanding	CodeCode Available	5
Active Learning for Neural PDE Solvers	Aug 2, 2024	Active Learning	CodeCode Available	5
Penzai + Treescope: A Toolkit for Interpreting, Visualizing, and Editing Models As Data	Aug 1, 2024		CodeCode Available	5
MuJoCo MPC for Humanoid Control: Evaluation on HumanoidBench	Aug 1, 2024	Humanoid ControlMuJoCo	CodeCode Available	5
Segment Anything for Videos: A Systematic Survey	Jul 31, 2024	Image SegmentationRobot Manipulation Generalization	CodeCode Available	5
Tora: Trajectory-oriented Diffusion Transformer for Video Generation	Jul 31, 2024	Video CompressionVideo Generation	CodeCode Available	5
Stretching Each Dollar: Diffusion Training from Scratch on a Micro-Budget	Jul 22, 2024	Mixture-of-Experts	CodeCode Available	5
CatVTON: Concatenation Is All You Need for Virtual Try-On with Diffusion Models	Jul 21, 2024	AllFashion Synthesis	CodeCode Available	5
Agent-E: From Autonomous Web Navigation to Foundational Design Principles in Agentic Systems	Jul 17, 2024	Autonomous Web NavigationDenoising	CodeCode Available	5
IMAGDressing-v1: Customizable Virtual Dressing	Jul 17, 2024	DenoisingImage Generation	CodeCode Available	5
VoxBlink2: A 100K+ Speaker Recognition Corpus and the Open-Set Speaker-Identification Benchmark	Jul 16, 2024	DiversitySpeaker Identification	CodeCode Available	5
Semantic Operators: A Declarative Model for Rich, AI-based Data Processing	Jul 16, 2024	Extreme Multi-Label ClassificationFact Checking	CodeCode Available	5
BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval	Jul 16, 2024	Question AnsweringRetrieval	CodeCode Available	5
GRUtopia: Dream General Robots in a City at Scale	Jul 15, 2024	Language ModellingLarge Language Model	CodeCode Available	5
Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients	Jul 11, 2024	Quantization	CodeCode Available	5
OffsetBias: Leveraging Debiased Data for Tuning Evaluators	Jul 9, 2024		CodeCode Available	5
Aligning Cyber Space with Physical World: A Comprehensive Survey on Embodied AI	Jul 9, 2024	Survey	CodeCode Available	5
TAPVid-3D: A Benchmark for Tracking Any Point in 3D	Jul 8, 2024	Point Tracking	CodeCode Available	5
Fast On-device LLM Inference with NPUs	Jul 8, 2024	CPUGPU	CodeCode Available	5
Structural Generalization in Autonomous Cyber Incident Response with Message-Passing Neural Networks and Reinforcement Learning	Jul 8, 2024		CodeCode Available	5
Learning to (Learn at Test Time): RNNs with Expressive Hidden States	Jul 5, 2024	16k8k	CodeCode Available	5
BM25S: Orders of magnitude faster lexical search via eager sparse scoring	Jul 4, 2024	Passage RetrievalRetrieval	CodeCode Available	5