SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 25912600 of 474278 papers

TitleStatusHype
GPT-ImgEval: A Comprehensive Benchmark for Diagnosing GPT4o in Image GenerationCode3
Multi-SWE-bench: A Multilingual Benchmark for Issue ResolvingCode3
Affordable AI Assistants with Knowledge Graph of ThoughtsCode3
Audio-visual Controlled Video Diffusion with Masked Selective State Spaces Modeling for Natural Talking Head GenerationCode3
VARGPT-v1.1: Improve Visual Autoregressive Large Unified Model via Iterative Instruction Tuning and Reinforcement LearningCode3
End-to-End Driving with Online Trajectory Evaluation via BEV World ModelCode3
YourBench: Easy Custom Evaluation Sets for EveryoneCode3
AnimeGamer: Infinite Anime Life Simulation with Next Game State PredictionCode3
Beyond Quacking: Deep Integration of Language Models and RAG into DuckDBCode3
MedReason: Eliciting Factual Medical Reasoning Steps in LLMs via Knowledge GraphsCode3
Show:102550
← PrevPage 260 of 47428Next →