SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 21262150 of 177340 papers

TitleStatusHype
CRUXEval: A Benchmark for Code Reasoning, Understanding and ExecutionCode4
VideoEval-Pro: Robust and Realistic Long Video Understanding EvaluationCode4
CitationMap: A Python Tool to Identify and Visualize Your Google Scholar Citations Around the WorldCode4
Real-time volumetric rendering of dynamic humansCode4
Improving Parallel Program Performance with LLM Optimizers via Agent-System InterfacesCode4
DeepFakes and Beyond: A Survey of Face Manipulation and Fake DetectionCode4
Inductive Moment MatchingCode4
Polysemous codesCode4
SWE-bench: Can Language Models Resolve Real-World GitHub Issues?Code4
RUMI: Rummaging Using Mutual InformationCode4
ChatGPT Outperforms Crowd-Workers for Text-Annotation TasksCode4
A General Theoretical Paradigm to Understand Learning from Human PreferencesCode4
Self-Supervised Geometry-Guided Initialization for Robust Monocular Visual OdometryCode4
MUSE: Machine Unlearning Six-Way Evaluation for Language ModelsCode4
Stock Price Prediction via Discovering Multi-Frequency Trading PatternsCode4
The Model Openness Framework: Promoting Completeness and Openness for Reproducibility, Transparency, and Usability in Artificial IntelligenceCode4
Fast Transformer Decoding: One Write-Head is All You NeedCode4
OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction DataCode4
DisCo-DSO: Coupling Discrete and Continuous Optimization for Efficient Generative Design in Hybrid SpacesCode4
Ideas in Inference-time Scaling can Benefit Generative Pre-training AlgorithmsCode4
Tiny-PULP-Dronets: Squeezing Neural Networks for Faster and Lighter Inference on Multi-Tasking Autonomous Nano-DronesCode4
ReARTeR: Retrieval-Augmented Reasoning with Trustworthy Process RewardingCode4
PointVLA: Injecting the 3D World into Vision-Language-Action ModelsCode4
ViViD: Video Virtual Try-on using Diffusion ModelsCode4
GeoWizard: Unleashing the Diffusion Priors for 3D Geometry Estimation from a Single ImageCode4
Show:102550
← PrevPage 86 of 7094Next →