SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 24512460 of 474278 papers

TitleStatusHype
SealQA: Raising the Bar for Reasoning in Search-Augmented Language ModelsCode3
EXP-Bench: Can AI Conduct AI Research Experiments?Code3
MathArena: Evaluating LLMs on Uncontaminated Math CompetitionsCode3
EmergentTTS-Eval: Evaluating TTS Models on Complex Prosodic, Expressiveness, and Linguistic Challenges Using Model-as-a-JudgeCode3
Impromptu VLA: Open Weights and Open Data for Driving Vision-Language-Action ModelsCode3
KVzip: Query-Agnostic KV Cache Compression with Context ReconstructionCode3
MAGREF: Masked Guidance for Any-Reference Video GenerationCode3
BioReason: Incentivizing Multimodal Biological Reasoning within a DNA-LLM ModelCode3
TiRex: Zero-Shot Forecasting Across Long and Short Horizons with Enhanced In-Context LearningCode3
VRAG-RL: Empower Vision-Perception-Based RAG for Visually Rich Information Understanding via Iterative Reasoning with Reinforcement LearningCode3
Show:102550
← PrevPage 246 of 47428Next →