SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

658,356 papers258,216 code links4,818 tasks

Papers

Showing 101125 of 658356 papers

TitleStatusHype
WebWalker: Benchmarking LLMs in Web TraversalCode11
SAM 2: Segment Anything in Images and VideosCode11
Gymnasium: A Standard Interface for Reinforcement Learning EnvironmentsCode11
Qwen3-TTS Technical Report9
PaperBanana: Automating Academic Illustration for AI Scientists9
Depth Pro: Sharp Monocular Metric Depth in Less Than a SecondCode9
StableToolBench: Towards Stable Large-Scale Benchmarking on Tool Learning of Large Language ModelsCode9
DeepSeek LLM: Scaling Open-Source Language Models with LongtermismCode9
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale PredictionCode9
Sapiens: Foundation for Human Vision ModelsCode9
Diffusion Forcing: Next-token Prediction Meets Full-Sequence DiffusionCode9
SkyReels-V2: Infinite-length Film Generative ModelCode9
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language ModelsCode9
ORPO: Monolithic Preference Optimization without Reference ModelCode9
SANA 1.5: Efficient Scaling of Training-Time and Inference-Time Compute in Linear Diffusion TransformerCode9
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language ModelCode9
Language agents achieve superhuman synthesis of scientific knowledgeCode9
YOLO-World: Real-Time Open-Vocabulary Object DetectionCode9
CMMMU: A Chinese Massive Multi-discipline Multimodal Understanding BenchmarkCode9
Liger Kernel: Efficient Triton Kernels for LLM TrainingCode9
CogVLM2: Visual Language Models for Image and Video UnderstandingCode9
SuperSimpleNet: Unifying Unsupervised and Supervised Learning for Fast and Reliable Surface Defect DetectionCode9
Grounded SAM: Assembling Open-World Models for Diverse Visual TasksCode9
StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video GenerationCode9
MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse AttentionCode9
Show:102550
← PrevPage 5 of 26335Next →