SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 46014625 of 661570 papers

TitleStatusHype
Ask don't tell: Reducing sycophancy in large language models0
Fixed Anchors Are Not Enough: Dynamic Retrieval and Persistent Homology for Dataset Distillation0
Transit Network Design with Two-Level Demand Uncertainties: A Machine Learning and Contextual Stochastic Optimization Framework0
Is Seeing Believing? Evaluating Human Sensitivity to Synthetic Video0
Model Medicine: A Clinical Framework for Understanding, Diagnosing, and Treating AI Models0
Med-DualLoRA: Local Adaptation of Foundation Models for 3D Cardiac MRI0
AutothinkRAG: Complexity-Aware Control of Retrieval-Augmented Reasoning for Image-Text Interaction0
Test-Time Adaptation via Many-Shot Prompting: Benefits, Limits, and Pitfalls0
Association of Progressive PPFE and Mortality in Lung Cancer Screening Cohorts0
Measuring AI Agents' Progress on Multi-Step Cyber Attack Scenarios0
Automating Skill Acquisition through Large-Scale Mining of Open-Source Agentic Repositories: A Framework for Multi-Agent Procedural Knowledge Extraction0
OrigamiBench: An Interactive Environment to Synthesize Flat-Foldable Origamis0
Efficient Federated Conformal Prediction with Group-Conditional Guarantees0
HindSight: Evaluating LLM-Generated Research Ideas via Future Impact0
V-JEPA 2.1: Unlocking Dense Features in Video Self-Supervised Learning0
Unified Removal of Raindrops and Reflections: A New Benchmark and A Novel Pipeline0
More Test-Time Compute Can Hurt: Overestimation Bias in LLM Beam Search0
Gym-V: A Unified Vision Environment System for Agentic Vision Research0
Look Before Acting: Enhancing Vision Foundation Representations for Vision-Language-Action Models0
Unpaired Cross-Domain Calibration of DMSP to VIIRS Nighttime Light Data Based on CUT Network0
Explainable machine learning workflows for radio astronomical data processing0
Toward Experimentation-as-a-Service in 5G/6G: The Plaza6G Prototype for AI-Assisted Trials0
D^3-RSMDE: 40 Faster and High-Fidelity Remote Sensing Monocular Depth Estimation0
FactorEngine: A Program-level Knowledge-Infused Factor Mining Framework for Quantitative Investment0
DynamicGate MLP Conditional Computation via Learned Structural Dropout and Input Dependent Gating for Functional Plasticity0
Show:102550
← PrevPage 185 of 26463Next →