SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

659,983 papers248,104 code links4,818 tasks

Papers

Showing 876900 of 659983 papers

TitleStatusHype
A Survey of Text-to-SQL in the Era of LLMs: Where are we, and where are we going?Code5
SAM2-Adapter: Evaluating & Adapting Segment Anything 2 in Downstream Tasks: Camouflage, Shadow, Medical Image Segmentation, and MoreCode5
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model ParametersCode5
Mini-Monkey: Alleviating the Semantic Sawtooth Effect for Lightweight MLLMs via Complementary Image PyramidCode5
Active Learning for Neural PDE SolversCode5
Penzai + Treescope: A Toolkit for Interpreting, Visualizing, and Editing Models As DataCode5
MuJoCo MPC for Humanoid Control: Evaluation on HumanoidBenchCode5
Segment Anything for Videos: A Systematic SurveyCode5
Tora: Trajectory-oriented Diffusion Transformer for Video GenerationCode5
Stretching Each Dollar: Diffusion Training from Scratch on a Micro-BudgetCode5
CatVTON: Concatenation Is All You Need for Virtual Try-On with Diffusion ModelsCode5
Agent-E: From Autonomous Web Navigation to Foundational Design Principles in Agentic SystemsCode5
IMAGDressing-v1: Customizable Virtual DressingCode5
VoxBlink2: A 100K+ Speaker Recognition Corpus and the Open-Set Speaker-Identification BenchmarkCode5
Semantic Operators: A Declarative Model for Rich, AI-based Data ProcessingCode5
BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive RetrievalCode5
GRUtopia: Dream General Robots in a City at ScaleCode5
Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank GradientsCode5
OffsetBias: Leveraging Debiased Data for Tuning EvaluatorsCode5
Aligning Cyber Space with Physical World: A Comprehensive Survey on Embodied AICode5
TAPVid-3D: A Benchmark for Tracking Any Point in 3DCode5
Fast On-device LLM Inference with NPUsCode5
Structural Generalization in Autonomous Cyber Incident Response with Message-Passing Neural Networks and Reinforcement LearningCode5
Learning to (Learn at Test Time): RNNs with Expressive Hidden StatesCode5
BM25S: Orders of magnitude faster lexical search via eager sparse scoringCode5
Show:102550
← PrevPage 36 of 26400Next →