SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 18261850 of 661570 papers

TitleStatusHype
Separating Diagnosis from Control: Auditable Policy Adaptation in Agent-Based Simulations with LLM-Based Diagnostics0
EchoKV: Efficient KV Cache Compression via Similarity-Based Reconstruction0
Ran Score: a LLM-based Evaluation Score for Radiology Report Generation0
FixationFormer: Direct Utilization of Expert Gaze Trajectories for Chest X-Ray Classification0
Algorithmic warm starts for Hamiltonian Monte Carlo0
Beyond Binary Correctness: Scaling Evaluation of Long-Horizon Agents on Subjective Enterprise Tasks0
REALITrees: Rashomon Ensemble Active Learning for Interpretable Trees0
CLiGNet: Clinical Label-Interaction Graph Network for Medical Specialty Classification from Clinical Transcriptions0
PRISM: A Dual View of LLM Reasoning through Semantic Flow and Latent Computation0
KALAVAI: Predicting When Independent Specialist Fusion Works -- A Quantitative Model for Post-Hoc Cooperative LLM Training0
MVPBench: A Multi-Video Perception Evaluation Benchmark for Multi-Modal Video Understanding0
Multimodal Industrial Anomaly Detection via Geometric Prior0
Predictive Photometric Uncertainty in Gaussian Splatting for Novel View Synthesis0
Quantum Random Forest for the Regression Problem0
ABSTRAL: Automatic Design of Multi-Agent Systems Through Iterative Refinement and Topology Optimization0
Optimizing Small Language Models for NL2SQL via Chain-of-Thought Fine-Tuning0
Transformers Trained via Gradient Descent Can Provably Learn a Class of Teacher Models0
When AI Shows Its Work, Is It Actually Working? Step-Level Evaluation Reveals Frontier Language Models Frequently Bypass Their Own Reasoning0
TDATR: Improving End-to-End Table Recognition via Table Detail-Aware Learning and Cell-Level Visual Alignment0
RadTimeline: Timeline Summarization for Longitudinal Radiological Lung Findings0
PersonalQ: Select, Quantize, and Serve Personalized Diffusion Models for Efficient Inference0
A PAC-Bayesian approach to generalization for quantum models0
Zero-Shot Personalization of Objects via Textual Inversion0
Knowledge Access Beats Model Size: Memory Augmented Routing for Persistent AI Agents0
A Sobering Look at Tabular Data Generation via Probabilistic Circuits0
Show:102550
← PrevPage 74 of 26463Next →