SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

658,356 papers258,216 code links4,818 tasks

Papers

Showing 201250 of 658356 papers

TitleStatusHype
LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt CompressionCode9
StableToolBench: Towards Stable Large-Scale Benchmarking on Tool Learning of Large Language ModelsCode9
ORPO: Monolithic Preference Optimization without Reference ModelCode9
LLM4Decompile: Decompiling Binary Code with Large Language ModelsCode9
Divide and Conquer: High-Resolution Industrial Anomaly Detection via Memory Efficient Tiled EnsembleCode9
Yi: Open Foundation Models by 01.AICode9
OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-onCode9
TripoSR: Fast 3D Object Reconstruction from a Single ImageCode9
World Model on Million-Length Video And Language With Blockwise RingAttentionCode9
UFO: A UI-Focused Agent for Windows OS InteractionCode9
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language ModelsCode9
Natural language guidance of high-fidelity text-to-speech with synthetic annotationsCode9
OLMo: Accelerating the Science of Language ModelsCode9
YOLO-World: Real-Time Open-Vocabulary Object DetectionCode9
Grounded SAM: Assembling Open-World Models for Diverse Visual TasksCode9
Steering Language Models with Game-Theoretic SolversCode9
CMMMU: A Chinese Massive Multi-discipline Multimodal Understanding BenchmarkCode9
Depth Anything: Unleashing the Power of Large-Scale Unlabeled DataCode9
VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion ModelsCode9
DeepSeek LLM: Scaling Open-Source Language Models with LongtermismCode9
Perception Encoder: The best visual embeddings are not at the output of the networkCode8
GPT4All: An Ecosystem of Open Source Compressed Language ModelsCode8
Llama 2: Open Foundation and Fine-Tuned Chat ModelsCode8
Adapting Large Language Model with Speech for Fully Formatted End-to-End Speech RecognitionCode8
DETRs Beat YOLOs on Real-time Object DetectionCode8
Robust Speech Recognition via Large-Scale Weak SupervisionCode8
Fine-mixing: Mitigating Backdoors in Fine-tuned Language ModelsCode8
DocLayNet: A Large Human-Annotated Dataset for Document-Layout AnalysisCode8
Attention Residuals7
WideSeek-R1: Exploring Width Scaling for Broad Information Seeking via Multi-Agent Reinforcement Learning7
Let It Flow: Agentic Crafting on Rock and Roll, Building the ROME Model within an Open Agentic Learning Ecosystem7
Pretraining Large Language Models with NVFP47
dLLM: Simple Diffusion Language Modeling7
GigaBrain-0.5M*: a VLA That Learns From World Model-Based Reinforcement Learning7
SAM 3D Body: Robust Full-Body Human Mesh Recovery7
Qwen3-ASR Technical Report7
Advancing Open-source World Models7
Is Diversity All You Need for Scalable Robotic Manipulation?Code7
Skywork-R1V3 Technical ReportCode7
EvoAgentX: An Automated Framework for Evolving Agentic WorkflowsCode7
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement LearningCode7
OmniGen2: Exploration to Advanced Multimodal GenerationCode7
From Bytes to Ideas: Language Modeling with Autoregressive U-NetsCode7
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning AttentionCode7
AgentOrchestra: A Hierarchical Multi-Agent Framework for General-Purpose Task SolvingCode7
ComfyUI-R1: Exploring Reasoning Models for Workflow GenerationCode7
V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and PlanningCode7
Step-Audio-AQAA: a Fully End-to-End Expressive Large Audio Language ModelCode7
Reinforcement Learning Optimization for Large-Scale Learning: An Efficient and User-Friendly Scaling LibraryCode7
MMSU: A Massive Multi-task Spoken Language Understanding and Reasoning BenchmarkCode7
Show:102550
← PrevPage 5 of 13168Next →