SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 52015225 of 661570 papers

TitleStatusHype
AgentTrace: Causal Graph Tracing for Root Cause Analysis in Deployed Multi-Agent Systems0
Evaluating Causal Discovery Algorithms for Path-Specific Fairness and Utility in Healthcare0
The Importance of Being Smoothly Calibrated0
Automated Counting of Stacked Objects in Industrial Inspection0
Unbiased and Biased Variance-Reduced Forward-Reflected-Backward Splitting Methods for Stochastic Composite Inclusions0
Lite Any Stereo: Efficient Zero-Shot Stereo Matching0
daVinci-Env: Open SWE Environment Synthesis at Scale0
Intelligent Co-Design: An Interactive LLM Framework for Interior Spatial Design via Multi-Modal Agents0
Beyond Benchmark Islands: Toward Representative Trustworthiness Evaluation for Agentic AICode0
Geometric framework for biological evolution0
MobileLLM-Flash: Latency-Guided On-Device LLM Design for Industry Scale0
AGCD: Agent-Guided Cross-Modal Decoding for Weather Forecasting0
Understanding Reasoning in LLMs through Strategic Information Allocation under Uncertainty0
Learnability with Partial Labels and Adaptive Nearest Neighbors0
Semi-Autonomous Formalization of the Vlasov-Maxwell-Landau Equilibrium1
EndoCoT: Scaling Endogenous Chain-of-Thought Reasoning in Diffusion Models1
Revisiting Model Stitching In the Foundation Model Era0
Neural Value Iteration0
Self-Supervised ImageNet Representations for In Vivo Confocal Microscopy: Tortuosity Grading without Segmentation Maps0
Pretraining and Benchmarking Modern Encoders for Latvian0
Deep Reinforcement Learning for Fano Hypersurfaces0
BIS Reasoning 1.0: The First Large-Scale Japanese Benchmark for Belief-Inconsistent Syllogistic Reasoning0
Seismic full-waveform inversion based on a physics-driven generative adversarial network0
SRL-MAD: Structured Residual Latents for One-Class Morphing Attack Detection0
Establishing Construct Validity in LLM Capability Benchmarks Requires Nomological Networks0
Show:102550
← PrevPage 209 of 26463Next →