SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

659,983 papers248,104 code links4,818 tasks

Papers

Showing 651700 of 659983 papers

TitleStatusHype
Helios: Real Real-Time Long Video Generation Model5
Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters5
Rethinking the Design of Reinforcement Learning-Based Deep Research Agents5
World Action Models are Zero-shot Policies5
OpenTSLM: Time-Series Language Models for Reasoning over Multivariate Medical Text- and Time-Series Data5
FireRed-Image-Edit-1.0 Technical Report5
InternAgent-1.5: A Unified Agentic Framework for Long-Horizon Autonomous Scientific Discovery5
CLaRa: Bridging Retrieval and Generation with Continuous Latent Reasoning5
MixGRPO: Unlocking Flow-based GRPO Efficiency with Mixed ODE-SDE5
Kimi K2.5: Visual Agentic Intelligence5
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey5
SAMTok: Representing Any Mask with Two Words5
UQLM: A Python Package for Uncertainty Quantification in Large Language ModelsCode5
skfolio: Portfolio Optimization in PythonCode5
Thinking with Images for Multimodal Reasoning: Foundations, Methods, and Future FrontiersCode5
RAG-R1 : Incentivize the Search and Reasoning Capabilities of LLMs through Multi-query ParallelismCode5
ThinkSound: Chain-of-Thought Reasoning in Multimodal Large Language Models for Audio Generation and EditingCode5
LongWriter-Zero: Mastering Ultra-Long Text Generation via Reinforcement LearningCode5
Matrix-Game: Interactive World Foundation ModelCode5
YOLOv13: Real-Time Object Detection with Hypergraph-Enhanced Adaptive Visual PerceptionCode5
Show-o2: Improved Native Unified Multimodal ModelsCode5
Stream-Omni: Simultaneous Multimodal Interactions with Large Language-Vision-Speech ModelCode5
SoundMind: RL-Incentivized Logic Reasoning for Audio-Language ModelsCode5
A quantum semantic framework for natural language processingCode5
τ^2-Bench: Evaluating Conversational Agents in a Dual-Control EnvironmentCode5
LeVo: High-Quality Song Generation with Multi-Preference AlignmentCode5
Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation ModelsCode5
AssetOpsBench: Benchmarking AI Agents for Task Automation in Industrial Asset Operations and MaintenanceCode5
Trajectory Prediction Meets Large Language Models: A SurveyCode5
AgentCPM-GUI: Building Mobile-Use Agents with Reinforcement Fine-TuningCode5
OmniV2V: Versatile Video Generation and Editing via Dynamic Content ManipulationCode5
EvoGit: Decentralized Code Evolution via Git-Based Multi-Agent CollaborationCode5
R-KV: Redundancy-aware KV Cache Compression for Training-Free Reasoning Models AccelerationCode5
REASONING GYM: Reasoning Environments for Reinforcement Learning with Verifiable RewardsCode5
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language ModelsCode5
Darwin Godel Machine: Open-Ended Evolution of Self-Improving AgentsCode5
Autoformalization in the Era of Large Language Models: A SurveyCode5
rStar-Coder: Scaling Competitive Code Reasoning with a Large-Scale Verified DatasetCode5
FunReason: Enhancing Large Language Models' Function Calling via Self-Refinement Multiscale Loss and Automated Data RefinementCode5
Reinforcement Fine-Tuning Powers Reasoning Capability of Multimodal Large Language ModelsCode5
BLAST: Balanced Sampling Time Series Corpus for Universal Forecasting ModelsCode5
Direct3D-S2: Gigascale 3D Generation Made Easy with Spatial Sparse AttentionCode5
NovelSeek: When Agent Becomes the Scientist -- Building Closed-Loop System from Hypothesis to VerificationCode5
SoftHGNN: Soft Hypergraph Neural Networks for General Visual RecognitionCode5
Benchmarking the Myopic Trap: Positional Bias in Information RetrievalCode5
DeepEyes: Incentivizing "Thinking with Images" via Reinforcement LearningCode5
Meta-World+: An Improved, Standardized, RL BenchmarkCode5
Group-in-Group Policy Optimization for LLM Agent TrainingCode5
BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and DatasetCode5
DanceGRPO: Unleashing GRPO on Visual GenerationCode5
Show:102550
← PrevPage 14 of 13200Next →