SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

659,983 papers248,104 code links4,818 tasks

Papers

Showing 801850 of 177339 papers

TitleStatusHype
MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language ModelsCode5
MambaIR: A Simple Baseline for Image Restoration with State-Space ModelCode5
DiffusionDrive: Truncated Diffusion Model for End-to-End Autonomous DrivingCode5
Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and VideosCode5
TS3-Codec: Transformer-Based Simple Streaming Single CodecCode5
MOSPAT: AutoML based Model Selection and Parameter Tuning for Time Series Anomaly DetectionCode5
On the reusability of samples in active learningCode5
Robust Multivariate Time-Series Forecasting: Adversarial Attacks and Defense MechanismsCode5
AlphaPose: Whole-Body Regional Multi-Person Pose Estimation and Tracking in Real-TimeCode5
DepthSplat: Connecting Gaussian Splatting and DepthCode5
BLAST: Balanced Sampling Time Series Corpus for Universal Forecasting ModelsCode5
Does `Deep Learning on a Data Diet' reproduce? Overall yes, but GraNd at Initialization does notCode5
RAFT: Reward rAnked FineTuning for Generative Foundation Model AlignmentCode5
Fine-Tuning Vision-Language-Action Models: Optimizing Speed and SuccessCode5
Infinite Photorealistic Worlds using Procedural GenerationCode5
ChatGPT MT: Competitive for High- (but not Low-) Resource LanguagesCode5
YOLOR-Based Multi-Task LearningCode5
CacheGen: KV Cache Compression and Streaming for Fast Large Language Model ServingCode5
InstructPix2Pix: Learning to Follow Image Editing InstructionsCode5
LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-trainingCode5
Human Gaussian Splatting: Real-time Rendering of Animatable AvatarsCode5
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language ModelsCode5
GaussianObject: High-Quality 3D Object Reconstruction from Four Views with Gaussian SplattingCode5
3D Diffusion Policy: Generalizable Visuomotor Policy Learning via Simple 3D RepresentationsCode5
MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View ImagesCode5
Lean Copilot: Large Language Models as Copilots for Theorem Proving in LeanCode5
Balance Reward and Safety Optimization for Safe Reinforcement Learning: A Perspective of Gradient ManipulationCode5
Enhancing Efficiency of Safe Reinforcement Learning via Sample ManipulationCode5
The Vizier Gaussian Process Bandit AlgorithmCode5
Fundamental Components of Deep Learning: A category-theoretic approachCode5
Magma: A Foundation Model for Multimodal AI AgentsCode5
LiveBench: A Challenging, Contamination-Limited LLM BenchmarkCode5
FuXi-2.0: Advancing machine learning weather forecasting model for practical applicationsCode5
Retinexformer: One-stage Retinex-based Transformer for Low-light Image EnhancementCode5
Neural Fields in Robotics: A SurveyCode5
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMsCode5
SWE-Lancer: Can Frontier LLMs Earn $1 Million from Real-World Freelance Software Engineering?Code5
TripoSG: High-Fidelity 3D Shape Synthesis using Large-Scale Rectified Flow ModelsCode5
TikZero: Zero-Shot Text-Guided Graphics Program SynthesisCode5
VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic FaithfulnessCode5
ZeroSearch: Incentivize the Search Capability of LLMs without SearchingCode5
Show-o2: Improved Native Unified Multimodal ModelsCode5
REASONING GYM: Reasoning Environments for Reinforcement Learning with Verifiable RewardsCode5
DoWhy-GCM: An extension of DoWhy for causal inference in graphical causal modelsCode5
VADv2: End-to-End Vectorized Autonomous Driving via Probabilistic PlanningCode5
Rethinking LLM Language Adaptation: A Case Study on Chinese MixtralCode5
Penzai + Treescope: A Toolkit for Interpreting, Visualizing, and Editing Models As DataCode5
Uni-Mol Docking V2: Towards Realistic and Accurate Binding Pose PredictionCode5
Showing Many Labels in Multi-label Classification Models: An Empirical Study of Adversarial ExamplesCode5
IMAGDressing-v1: Customizable Virtual DressingCode5
Show:102550
← PrevPage 17 of 3547Next →