SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 101150 of 474278 papers

TitleStatusHype
SWE-agent: Agent-Computer Interfaces Enable Automated Software EngineeringCode11
HybridFlow: A Flexible and Efficient RLHF FrameworkCode11
PaperBanana: Automating Academic Illustration for AI Scientists9
Qwen3-TTS Technical Report9
MuseTalk: Real-Time High-Fidelity Video Dubbing via Spatio-Temporal SamplingCode9
Moshi: a speech-text foundation model for real-time dialogueCode9
OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-onCode9
RWKV-7 "Goose" with Expressive Dynamic State EvolutionCode9
OpenELM: An Efficient Language Model Family with Open Training and Inference FrameworkCode9
HART: Efficient Visual Generation with Hybrid Autoregressive TransformerCode9
MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec TransformerCode9
FinRobot: AI Agent for Equity Research and Valuation with Large Language ModelsCode9
Language agents achieve superhuman synthesis of scientific knowledgeCode9
Contextual Augmented Multi-Model Programming (CAMP): A Hybrid Local-Cloud Copilot FrameworkCode9
StableToolBench: Towards Stable Large-Scale Benchmarking on Tool Learning of Large Language ModelsCode9
Depth Pro: Sharp Monocular Metric Depth in Less Than a SecondCode9
ORPO: Monolithic Preference Optimization without Reference ModelCode9
MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse AttentionCode9
Sapiens: Foundation for Human Vision ModelsCode9
SkyReels-V2: Infinite-length Film Generative ModelCode9
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language ModelsCode9
DeepSeek LLM: Scaling Open-Source Language Models with LongtermismCode9
SANA 1.5: Efficient Scaling of Training-Time and Inference-Time Compute in Linear Diffusion TransformerCode9
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language ModelCode9
BABILong: Testing the Limits of LLMs with Long Context Reasoning-in-a-HaystackCode9
TorchTitan: One-stop PyTorch native solution for production ready LLM pre-trainingCode9
Diffusion Forcing: Next-token Prediction Meets Full-Sequence DiffusionCode9
Liger Kernel: Efficient Triton Kernels for LLM TrainingCode9
CogVLM2: Visual Language Models for Image and Video UnderstandingCode9
SuperSimpleNet: Unifying Unsupervised and Supervised Learning for Fast and Reliable Surface Defect DetectionCode9
Grounded SAM: Assembling Open-World Models for Diverse Visual TasksCode9
StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video GenerationCode9
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale PredictionCode9
LW-DETR: A Transformer Replacement to YOLO for Real-Time DetectionCode9
FlashInfer: Efficient and Customizable Attention Engine for LLM Inference ServingCode9
Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent CollaborationCode9
Symbolic Learning Enables Self-Evolving AgentsCode9
Aviary: training language agents on challenging scientific tasksCode9
Enhancing Investment Analysis: Optimizing AI-Agent Collaboration in Financial ResearchCode9
Metis: A Foundation Speech Generation Model with Masked Generative Pre-trainingCode9
Dolphin: Document Image Parsing via Heterogeneous Anchor PromptingCode9
CMMMU: A Chinese Massive Multi-discipline Multimodal Understanding BenchmarkCode9
YOLO-World: Real-Time Open-Vocabulary Object DetectionCode9
Yi: Open Foundation Models by 01.AICode9
Steering Language Models with Game-Theoretic SolversCode9
VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the WildCode9
(Perhaps) Beyond Human Translation: Harnessing Multi-Agent Collaboration for Translating Ultra-Long Literary TextsCode9
LawGPT: A Chinese Legal Knowledge-Enhanced Large Language ModelCode9
AutoAgent: A Fully-Automated and Zero-Code Framework for LLM AgentsCode9
MonkeyOCR: Document Parsing with a Structure-Recognition-Relation Triplet ParadigmCode9
Show:102550
← PrevPage 3 of 9486Next →