SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

659,983 papers248,104 code links4,818 tasks

Papers

Showing 676700 of 659983 papers

TitleStatusHype
LeVo: High-Quality Song Generation with Multi-Preference AlignmentCode5
Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation ModelsCode5
AssetOpsBench: Benchmarking AI Agents for Task Automation in Industrial Asset Operations and MaintenanceCode5
Trajectory Prediction Meets Large Language Models: A SurveyCode5
AgentCPM-GUI: Building Mobile-Use Agents with Reinforcement Fine-TuningCode5
OmniV2V: Versatile Video Generation and Editing via Dynamic Content ManipulationCode5
EvoGit: Decentralized Code Evolution via Git-Based Multi-Agent CollaborationCode5
R-KV: Redundancy-aware KV Cache Compression for Training-Free Reasoning Models AccelerationCode5
REASONING GYM: Reasoning Environments for Reinforcement Learning with Verifiable RewardsCode5
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language ModelsCode5
Darwin Godel Machine: Open-Ended Evolution of Self-Improving AgentsCode5
Autoformalization in the Era of Large Language Models: A SurveyCode5
rStar-Coder: Scaling Competitive Code Reasoning with a Large-Scale Verified DatasetCode5
FunReason: Enhancing Large Language Models' Function Calling via Self-Refinement Multiscale Loss and Automated Data RefinementCode5
Reinforcement Fine-Tuning Powers Reasoning Capability of Multimodal Large Language ModelsCode5
BLAST: Balanced Sampling Time Series Corpus for Universal Forecasting ModelsCode5
Direct3D-S2: Gigascale 3D Generation Made Easy with Spatial Sparse AttentionCode5
NovelSeek: When Agent Becomes the Scientist -- Building Closed-Loop System from Hypothesis to VerificationCode5
SoftHGNN: Soft Hypergraph Neural Networks for General Visual RecognitionCode5
Benchmarking the Myopic Trap: Positional Bias in Information RetrievalCode5
DeepEyes: Incentivizing "Thinking with Images" via Reinforcement LearningCode5
Meta-World+: An Improved, Standardized, RL BenchmarkCode5
Group-in-Group Policy Optimization for LLM Agent TrainingCode5
BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and DatasetCode5
DanceGRPO: Unleashing GRPO on Visual GenerationCode5
Show:102550
← PrevPage 28 of 26400Next →