SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

658,356 papers258,216 code links4,818 tasks

Papers

Showing 251300 of 658356 papers

TitleStatusHype
MMSU: A Massive Multi-task Spoken Language Understanding and Reasoning BenchmarkCode7
Pre^3: Enabling Deterministic Pushdown Automata for Faster Structured LLM GenerationCode7
OpenThoughts: Data Recipes for Reasoning ModelsCode7
AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language ReasoningCode7
Let Them Talk: Audio-Driven Multi-Person Conversational Video GenerationCode7
HiDream-I1: A High-Efficient Image Generative Foundation Model with Sparse Diffusion TransformerCode7
Paper2Poster: Towards Multimodal Poster Automation from Scientific PapersCode7
SageAttention2++: A More Efficient Implementation of SageAttention2Code7
HunyuanVideo-Avatar: High-Fidelity Audio-Driven Human Animation for Multiple CharactersCode7
SEW: Self-Evolving Agentic Workflows for Automated Code GenerationCode7
AI-Researcher: Autonomous Scientific InnovationCode7
Speechless: Speech Instruction Training Without Speech for Low Resource LanguagesCode7
ViDoRe Benchmark V2: Raising the Bar for Visual RetrievalCode7
An Empirical Study on Reinforcement Learning for Reasoning-Search Interleaved LLM AgentsCode7
Visual Agentic Reinforcement Fine-TuningCode7
Faster Video Diffusion with Trainable Sparse AttentionCode7
MAGI-1: Autoregressive Video Generation at ScaleCode7
Logo-LLM: Local and Global Modeling with Large Language Models for Time Series ForecastingCode7
SageAttention3: Microscaling FP4 Attention for Inference and An Exploration of 8-Bit TrainingCode7
Marigold: Affordable Adaptation of Diffusion-Based Image Generators for Image AnalysisCode7
Fast Text-to-Audio Generation with Adversarial Post-TrainingCode7
HealthBench: Evaluating Large Language Models Towards Improved Human HealthCode7
Embedding Atlas: Low-Friction, Interactive Embedding VisualizationCode7
Flow-GRPO: Training Flow Matching Models via Online RLCode7
Practical Efficiency of Muon for PretrainingCode7
Kimi-Audio Technical ReportCode7
RAGEN: Understanding Self-Evolution in LLM Agents via Multi-Turn Reinforcement LearningCode7
Paper2Code: Automating Code Generation from Scientific Papers in Machine LearningCode7
Step1X-Edit: A Practical Framework for General Image EditingCode7
Skywork R1V2: Multimodal Hybrid Reinforcement Learning for ReasoningCode7
TTRL: Test-Time Reinforcement LearningCode7
PerceptionLM: Open-Access Data and Models for Detailed Visual UnderstandingCode7
Chinese-Vicuna: A Chinese Instruction-following Llama-based ModelCode7
BrowseComp: A Simple Yet Challenging Benchmark for Browsing AgentsCode7
Aligning Anime Video Generation with Human FeedbackCode7
The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree SearchCode7
A Scalable Approach to Clustering Embedding ProjectionsCode7
Skywork R1V: Pioneering Multimodal Reasoning with Chain-of-ThoughtCode7
Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe SystemsCode7
Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base ModelCode7
Large Language Model Agent: A Survey on Methodology, Applications and ChallengesCode7
Open Deep Search: Democratizing Search with Open-source Reasoning AgentsCode7
Bridging Evolutionary Multiobjective Optimization and GPU Acceleration via TensorizationCode7
Qwen2.5-Omni Technical ReportCode7
Scaling Vision Pre-Training to 4K ResolutionCode7
SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the WildCode7
Enhancing Fourier Neural Operators with Local Spatial FeaturesCode7
InfiniteYou: Flexible Photo Recrafting While Preserving Your IdentityCode7
xLSTM 7B: A Recurrent LLM for Fast and Efficient InferenceCode7
LHM: Large Animatable Human Reconstruction Model from a Single Image in SecondsCode7
Show:102550
← PrevPage 6 of 13168Next →