SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

659,983 papers248,104 code links4,818 tasks

Papers

Showing 826850 of 177339 papers

TitleStatusHype
MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View ImagesCode5
Lean Copilot: Large Language Models as Copilots for Theorem Proving in LeanCode5
Balance Reward and Safety Optimization for Safe Reinforcement Learning: A Perspective of Gradient ManipulationCode5
Enhancing Efficiency of Safe Reinforcement Learning via Sample ManipulationCode5
The Vizier Gaussian Process Bandit AlgorithmCode5
Fundamental Components of Deep Learning: A category-theoretic approachCode5
Magma: A Foundation Model for Multimodal AI AgentsCode5
LiveBench: A Challenging, Contamination-Limited LLM BenchmarkCode5
FuXi-2.0: Advancing machine learning weather forecasting model for practical applicationsCode5
Retinexformer: One-stage Retinex-based Transformer for Low-light Image EnhancementCode5
Neural Fields in Robotics: A SurveyCode5
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMsCode5
SWE-Lancer: Can Frontier LLMs Earn $1 Million from Real-World Freelance Software Engineering?Code5
TripoSG: High-Fidelity 3D Shape Synthesis using Large-Scale Rectified Flow ModelsCode5
TikZero: Zero-Shot Text-Guided Graphics Program SynthesisCode5
VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic FaithfulnessCode5
ZeroSearch: Incentivize the Search Capability of LLMs without SearchingCode5
Show-o2: Improved Native Unified Multimodal ModelsCode5
REASONING GYM: Reasoning Environments for Reinforcement Learning with Verifiable RewardsCode5
DoWhy-GCM: An extension of DoWhy for causal inference in graphical causal modelsCode5
VADv2: End-to-End Vectorized Autonomous Driving via Probabilistic PlanningCode5
Rethinking LLM Language Adaptation: A Case Study on Chinese MixtralCode5
Penzai + Treescope: A Toolkit for Interpreting, Visualizing, and Editing Models As DataCode5
Uni-Mol Docking V2: Towards Realistic and Accurate Binding Pose PredictionCode5
Showing Many Labels in Multi-label Classification Models: An Empirical Study of Adversarial ExamplesCode5
Show:102550
← PrevPage 34 of 7094Next →