SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

659,983 papers248,104 code links4,818 tasks

Papers

Showing 801825 of 659983 papers

TitleStatusHype
CaP-X: A Framework for Benchmarking and Improving Coding Agents for Robot Manipulation0
mmFHE: mmWave Sensing with End-to-End Fully Homomorphic Encryption0
Sparse but Critical: A Token-Level Analysis of Distributional Shifts in RLVR Fine-Tuning of LLMs0
SkillRouter: Retrieve-and-Rerank Skill Selection for LLM Agents at Scale0
MinerU-Diffusion: Rethinking Document OCR as Inverse Rendering via Diffusion Decoding0
A Theoretical Framework for Energy-Aware Gradient Pruning in Federated Learning0
Color When It Counts: Grayscale-Guided Online Triggering for Always-On Streaming Video Sensing0
SPDE Methods for Nonparametric Bayesian Posterior Contraction and Laplace Approximation0
Wake Up to the Past: Using Memory to Model Fluid Wake Effects on Robots0
Functional Component Ablation Reveals Specialization Patterns in Hybrid Language Model Architectures0
Rashid: A Cipher-Based Framework for Exploring In-Context Language Learning0
OrgForge-IT: A Verifiable Synthetic Benchmark for LLM-Based Insider Threat Detection0
Sketch2CT: Multimodal Diffusion for Structure-Aware 3D Medical Volume Generation0
High Resolution Flood Extent Detection Using Deep Learning with Random Forest Derived Training Labels0
LLMON: An LLM-native Markup Language to Leverage Structure and Semantics at the LLM Interface0
Adversarial Vulnerabilities in Neural Operator Digital Twins: Gradient-Free Attacks on Nuclear Thermal-Hydraulic Surrogates0
Learning Sidewalk Autopilot from Multi-Scale Imitation with Corrective Behavior Expansion0
GraphRAG for Engineering Diagrams: ChatP&ID Enables LLM Interaction with P&IDs0
Ego2Web: A Web Agent Benchmark Grounded in Egocentric Videos0
Multimodal Training to Unimodal Deployment: Leveraging Unstructured Data During Training to Optimize Structured Data Only Deployment0
UrbanVGGT: Scalable Sidewalk Width Estimation from Street View Images0
AI Mental Models: Learned Intuition and Deliberation in a Bounded Neural Architecture0
Privacy-Preserving Reinforcement Learning from Human Feedback via Decoupled Reward Modeling0
MIOFlow 2.0: A unified framework for inferring cellular stochastic dynamics from single cell and spatial transcriptomics data0
Reddit After Roe: A Computational Analysis of Abortion Narratives and Barriers in the Wake of Dobbs0
Show:102550
← PrevPage 33 of 26400Next →