SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

658,356 papers258,108 code links4,818 tasks

Papers

Showing 101150 of 658356 papers

TitleStatusHype
WebWalker: Benchmarking LLMs in Web TraversalCode11
SAM 2: Segment Anything in Images and VideosCode11
Gymnasium: A Standard Interface for Reinforcement Learning EnvironmentsCode11
PaperBanana: Automating Academic Illustration for AI Scientists9
Qwen3-TTS Technical Report9
OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-onCode9
Contextual Augmented Multi-Model Programming (CAMP): A Hybrid Local-Cloud Copilot FrameworkCode9
StableToolBench: Towards Stable Large-Scale Benchmarking on Tool Learning of Large Language ModelsCode9
Depth Pro: Sharp Monocular Metric Depth in Less Than a SecondCode9
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale PredictionCode9
HART: Efficient Visual Generation with Hybrid Autoregressive TransformerCode9
Sapiens: Foundation for Human Vision ModelsCode9
SkyReels-V2: Infinite-length Film Generative ModelCode9
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language ModelsCode9
DeepSeek LLM: Scaling Open-Source Language Models with LongtermismCode9
SANA 1.5: Efficient Scaling of Training-Time and Inference-Time Compute in Linear Diffusion TransformerCode9
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language ModelCode9
Language agents achieve superhuman synthesis of scientific knowledgeCode9
TorchTitan: One-stop PyTorch native solution for production ready LLM pre-trainingCode9
Diffusion Forcing: Next-token Prediction Meets Full-Sequence DiffusionCode9
Liger Kernel: Efficient Triton Kernels for LLM TrainingCode9
CogVLM2: Visual Language Models for Image and Video UnderstandingCode9
SuperSimpleNet: Unifying Unsupervised and Supervised Learning for Fast and Reliable Surface Defect DetectionCode9
Grounded SAM: Assembling Open-World Models for Diverse Visual TasksCode9
StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video GenerationCode9
MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse AttentionCode9
ORPO: Monolithic Preference Optimization without Reference ModelCode9
FlashInfer: Efficient and Customizable Attention Engine for LLM Inference ServingCode9
Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent CollaborationCode9
Symbolic Learning Enables Self-Evolving AgentsCode9
Aviary: training language agents on challenging scientific tasksCode9
Enhancing Investment Analysis: Optimizing AI-Agent Collaboration in Financial ResearchCode9
Metis: A Foundation Speech Generation Model with Masked Generative Pre-trainingCode9
Dolphin: Document Image Parsing via Heterogeneous Anchor PromptingCode9
CMMMU: A Chinese Massive Multi-discipline Multimodal Understanding BenchmarkCode9
YOLO-World: Real-Time Open-Vocabulary Object DetectionCode9
Yi: Open Foundation Models by 01.AICode9
Steering Language Models with Game-Theoretic SolversCode9
VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the WildCode9
(Perhaps) Beyond Human Translation: Harnessing Multi-Agent Collaboration for Translating Ultra-Long Literary TextsCode9
LawGPT: A Chinese Legal Knowledge-Enhanced Large Language ModelCode9
BABILong: Testing the Limits of LLMs with Long Context Reasoning-in-a-HaystackCode9
NeedleBench: Can LLMs Do Retrieval and Reasoning in Information-Dense Context?Code9
YuE: Scaling Open Foundation Models for Long-Form Music GenerationCode9
Depth Anything V2Code9
LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-TuningCode9
Visually Descriptive Language Model for Vector Graphics ReasoningCode9
KAG: Boosting LLMs in Professional Domains via Knowledge Augmented GenerationCode9
World Model on Million-Length Video And Language With Blockwise RingAttentionCode9
UFO2: The Desktop AgentOSCode9
Show:102550
← PrevPage 3 of 13168Next →