SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 41264150 of 661570 papers

TitleStatusHype
Semi-supervised Shelter Mapping for WASH Accessibility Assessment in Rohingya Refugee Camps0
Reduced Density Matrices Through Machine Learning0
Safety-Preserving PTQ via Contrastive Alignment Loss0
A robust methodology for long-term sustainability evaluation of Machine Learning models0
Aligning Probabilistic Beliefs under Informative Missingness: LLM Steerability in Clinical Reasoning0
MagicWorld: Towards Long-Horizon Stability for Interactive Video World Exploration0
Neighbor GRPO: Contrastive ODE Policy Optimization Aligns Flow Models0
WPT: World-to-Policy Transfer via Online World Model Distillation0
IRIS-SLAM: Unified Geo-Instance Representations for Robust Semantic Localization and Mapping0
Vision to Geometry: 3D Spatial Memory for Sequential Embodied MLLM Reasoning and Exploration0
NRR-Core: Non-Resolution Reasoning as a Computational Framework for Contextual Identity and Ambiguity Preservation0
RADAR: Retrieval-Augmented Detector with Adversarial Refinement for Robust Fake News Detection0
A Comedy of Estimators: On KL Regularization in RL Training of LLMs0
VL-RouterBench: A Benchmark for Vision-Language Model Routing0
EpiQAL: Benchmarking Large Language Models in Epidemiological Question Answering for Enhanced Alignment and Reasoning0
What Patients Really Ask: Exploring the Effect of False Assumptions in Patient Information Seeking0
Generative Adversarial Networks for Resource State Generation0
EVM-QuestBench: An Execution-Grounded Benchmark for Natural-Language Transaction Code Generation0
APEX-SWE0
PaperScout: An Autonomous Agent for Academic Paper Search with Process-Aware Sequence-Level Policy Optimization0
GTS: Inference-Time Scaling of Latent Reasoning with a Learnable Gaussian Thought Sampler0
Digital FAST: An AI-Driven Multimodal Framework for Rapid and Early Stroke Screening0
YOLO26: An Analysis of NMS-Free End to End Framework for Real-Time Object Detection0
Unifying Heterogeneous Degradations: Uncertainty-Aware Diffusion Bridge Model for All-in-One Image Restoration0
Beware Untrusted Simulators -- Reward-Free Backdoor Attacks in Reinforcement Learning0
Show:102550
← PrevPage 166 of 26463Next →