SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

659,983 papers248,104 code links4,818 tasks

Papers

Showing 19512000 of 659983 papers

TitleStatusHype
PufferLib: Making Reinforcement Learning Libraries and Environments Play NiceCode4
Latent Swap Joint Diffusion for 2D Long-Form Latent GenerationCode4
Elucidating the Design Space of Diffusion-Based Generative ModelsCode4
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language modelsCode4
BitNet a4.8: 4-bit Activations for 1-bit LLMsCode4
A Survey on Vision-Language-Action Models for Embodied AICode4
DiffuCoder: Understanding and Improving Masked Diffusion Models for Code GenerationCode4
DeepFilterNet2: Towards Real-Time Speech Enhancement on Embedded Devices for Full-Band AudioCode4
Efficient Few-Shot Learning Without PromptsCode4
AndroidWorld: A Dynamic Benchmarking Environment for Autonomous AgentsCode4
Adapters: A Unified Library for Parameter-Efficient and Modular Transfer LearningCode4
Scalable 3D Panoptic Segmentation As Superpoint Graph ClusteringCode4
Generalizable and Animatable Gaussian Head AvatarCode4
Deep Industrial Image Anomaly Detection: A SurveyCode4
PharMolixFM: All-Atom Foundation Models for Molecular Modeling and GenerationCode4
Transformer for Object Re-Identification: A SurveyCode4
FRESCO: Spatial-Temporal Correspondence for Zero-Shot Video TranslationCode4
FLEX: FLEXible Federated Learning FrameworkCode4
Deep Multi-Frame Filtering for Hearing AidsCode4
Neuralangelo: High-Fidelity Neural Surface ReconstructionCode4
Is Sora a World Simulator? A Comprehensive Survey on General World Models and BeyondCode4
PAIR Diffusion: A Comprehensive Multimodal Object-Level Image EditorCode4
Training Software Engineering Agents and Verifiers with SWE-GymCode4
VideoPainter: Any-length Video Inpainting and Editing with Plug-and-Play Context ControlCode4
pgmpy: A Python Toolkit for Bayesian NetworksCode4
OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and ReasoningCode4
Rethinking Inductive Biases for Surface Normal EstimationCode4
UniAnimate: Taming Unified Video Diffusion Models for Consistent Human Image AnimationCode4
InkSight: Offline-to-Online Handwriting Conversion by Learning to Read and WriteCode4
Long-form factuality in large language modelsCode4
Molecular-driven Foundation Model for Oncologic PathologyCode4
Natural Language GenerationCode4
Medical SAM 2: Segment medical images as video via Segment Anything Model 2Code4
From Web Search towards Agentic Deep Research: Incentivizing Search with Reasoning AgentsCode4
Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language ModelingCode4
3D-aware Conditional Image SynthesisCode4
NeuPAN: Direct Point Robot Navigation with End-to-End Model-based LearningCode4
LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One DayCode4
MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding BenchmarkCode4
Pen and Paper Exercises in Machine LearningCode4
RewardBench: Evaluating Reward Models for Language ModelingCode4
Zero-Shot Image Restoration Using Denoising Diffusion Null-Space ModelCode4
Taming Rectified Flow for Inversion and EditingCode4
A Foundation Model for Zero-shot Logical Query ReasoningCode4
DoRA: Weight-Decomposed Low-Rank AdaptationCode4
Blind Image Deblurring with Unknown Kernel Size and Substantial NoiseCode4
Human Motion Diffusion ModelCode4
Fast Inference of Mixture-of-Experts Language Models with OffloadingCode4
Zero123++: a Single Image to Consistent Multi-view Diffusion Base ModelCode4
BitDistiller: Unleashing the Potential of Sub-4-Bit LLMs via Self-DistillationCode4
Show:102550
← PrevPage 40 of 13200Next →