SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 24612470 of 177340 papers

TitleStatusHype
Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code GenerationCode3
MTP: Advancing Remote Sensing Foundation Model via Multi-Task PretrainingCode3
DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing ReasoningCode3
Towards Modern Image Manipulation Localization: A Large-Scale Dataset and Novel MethodsCode3
AGD: an Auto-switchable Optimizer using Stepwise Gradient Difference for Preconditioning MatrixCode3
Language-based Audio Moment RetrievalCode3
Unified Data Management and Comprehensive Performance Evaluation for Urban Spatial-Temporal Prediction [Experiment, Analysis & Benchmark]Code3
A Chinese Dataset for Evaluating the Safeguards in Large Language ModelsCode3
Multi-Modality Representation Learning for Antibody-Antigen Interactions PredictionCode3
Improving Alignment and Robustness with Circuit BreakersCode3
Show:102550
← PrevPage 247 of 17734Next →