SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 84018425 of 474278 papers

TitleStatusHype
Horizon Reduction Makes RL ScalableCode0
PRING: Rethinking Protein-Protein Interaction Prediction from Pairs to GraphsCode0
Beyond GPT-5: Making LLMs Cheaper and Better via Performance-Efficiency Optimized RoutingCode0
Preconditioned Norms: A Unified Framework for Steepest Descent, Quasi-Newton and Adaptive MethodsCode0
X-Ego: Acquiring Team-Level Tactical Situational Awareness via Cross-Egocentric Contrastive Video Representation LearningCode0
The Zero-Step Thinking: An Empirical Study of Mode Selection as Harder Early Exit in Reasoning ModelsCode0
SCEESR: Semantic-Control Edge Enhancement for Diffusion-Based Super-ResolutionCode0
JointCQ: Improving Factual Hallucination Detection with Joint Claim and Query GenerationCode0
Balancing Rewards in Text Summarization: Multi-Objective Reinforcement Learning via HyperVolume OptimizationCode0
Graph Unlearning Meets Influence-aware Negative Preference OptimizationCode0
Decomposed Attention Fusion in MLLMs for Training-Free Video Reasoning SegmentationCode0
XBench: A Comprehensive Benchmark for Visual-Language Explanations in Chest RadiographyCode0
Middo: Model-Informed Dynamic Data Optimization for Enhanced LLM Fine-Tuning via Closed-Loop LearningCode0
Monitoring LLM-based Multi-Agent Systems Against Corruptions via Node EvaluationCode0
AdaSPEC: Selective Knowledge Distillation for Efficient Speculative DecodersCode0
Enhanced Cyclic Coordinate Descent Methods for Elastic Net Penalized Linear ModelsCode0
Towards Strong Certified Defense with Universal Asymmetric RandomizationCode0
Motion2Meaning: A Clinician-Centered Framework for Contestable LLM in Parkinson's Disease Gait InterpretationCode0
VITA-E: Natural Embodied Interaction with Concurrent Seeing, Hearing, Speaking, and Acting0
DiffGRM: Diffusion-based Generative Recommendation ModelCode0
Crucible: Quantifying the Potential of Control Algorithms through LLM AgentsCode0
R-Horizon: How Far Can Your Large Reasoning Model Really Go in Breadth and Depth?0
SimKO: Simple Pass@K Policy Optimization0
MoGA: Mixture-of-Groups Attention for End-to-End Long Video Generation0
IF-VidCap: Can Video Caption Models Follow Instructions?0
Show:102550
← PrevPage 337 of 18972Next →