SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 49765000 of 661570 papers

TitleStatusHype
Self-Refining Video Sampling2
DeFM: Learning Foundation Representations from Depth for Robotics2
Learning Video Generation for Robotic Manipulation with Collaborative Trajectory Control2
SenseNova-MARS: Empowering Multimodal Agentic Reasoning and Search via Reinforcement Learning2
BuildArena: A Physics-Aligned Interactive Benchmark of LLMs for Engineering Construction2
VisGym: Diverse, Customizable, Scalable Environments for Multimodal Agents2
Q-learning with Adjoint Matching2
Stable-DiffCoder: Pushing the Frontier of Code Diffusion Large Language Model2
The Prism Hypothesis: Harmonizing Semantic and Pixel Representations via Unified Autoencoding2
SciArena: An Open Evaluation Platform for Non-Verifiable Scientific Literature-Grounded Tasks2
Boosting Generative Image Modeling via Joint Image-Feature Synthesis2
GutenOCR: A Grounded Vision-Language Front-End for Documents2
BPMN Assistant: An LLM-Based Approach to Business Process Modeling2
Rethinking Video Generation Model for the Embodied World2
Adaptive Multi-Agent Reasoning via Automated Workflow GenerationCode2
CharaConsist: Fine-Grained Consistent Character GenerationCode2
Alleviating Textual Reliance in Medical Language-guided Segmentation via Prototype-driven Semantic ApproximationCode2
Seq vs Seq: An Open Suite of Paired Encoders and DecodersCode2
DrafterBench: Benchmarking Large Language Models for Tasks Automation in Civil EngineeringCode2
SystolicAttention: Fusing FlashAttention within a Single Systolic ArrayCode2
The Devil behind the mask: An emergent safety vulnerability of Diffusion LLMsCode2
Vision Language Action Models in Robotic Manipulation: A Systematic ReviewCode2
MGVQ: Could VQ-VAE Beat VAE? A Generalizable Tokenizer with Multi-group QuantizationCode2
CompassJudger-2: Towards Generalist Judge Model via Verifiable RewardsCode2
I^2-World: Intra-Inter Tokenization for Efficient Dynamic 4D Scene ForecastingCode2
Show:102550
← PrevPage 200 of 26463Next →