SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 46014650 of 661570 papers

TitleStatusHype
Ask don't tell: Reducing sycophancy in large language models0
Fixed Anchors Are Not Enough: Dynamic Retrieval and Persistent Homology for Dataset Distillation0
Transit Network Design with Two-Level Demand Uncertainties: A Machine Learning and Contextual Stochastic Optimization Framework0
Is Seeing Believing? Evaluating Human Sensitivity to Synthetic Video0
Model Medicine: A Clinical Framework for Understanding, Diagnosing, and Treating AI Models0
Med-DualLoRA: Local Adaptation of Foundation Models for 3D Cardiac MRI0
AutothinkRAG: Complexity-Aware Control of Retrieval-Augmented Reasoning for Image-Text Interaction0
Test-Time Adaptation via Many-Shot Prompting: Benefits, Limits, and Pitfalls0
Association of Progressive PPFE and Mortality in Lung Cancer Screening Cohorts0
Measuring AI Agents' Progress on Multi-Step Cyber Attack Scenarios0
Automating Skill Acquisition through Large-Scale Mining of Open-Source Agentic Repositories: A Framework for Multi-Agent Procedural Knowledge Extraction0
OrigamiBench: An Interactive Environment to Synthesize Flat-Foldable Origamis0
Efficient Federated Conformal Prediction with Group-Conditional Guarantees0
HindSight: Evaluating LLM-Generated Research Ideas via Future Impact0
V-JEPA 2.1: Unlocking Dense Features in Video Self-Supervised Learning0
Unified Removal of Raindrops and Reflections: A New Benchmark and A Novel Pipeline0
More Test-Time Compute Can Hurt: Overestimation Bias in LLM Beam Search0
Gym-V: A Unified Vision Environment System for Agentic Vision Research0
Look Before Acting: Enhancing Vision Foundation Representations for Vision-Language-Action Models0
Unpaired Cross-Domain Calibration of DMSP to VIIRS Nighttime Light Data Based on CUT Network0
Explainable machine learning workflows for radio astronomical data processing0
Toward Experimentation-as-a-Service in 5G/6G: The Plaza6G Prototype for AI-Assisted Trials0
D^3-RSMDE: 40 Faster and High-Fidelity Remote Sensing Monocular Depth Estimation0
FactorEngine: A Program-level Knowledge-Infused Factor Mining Framework for Quantitative Investment0
DynamicGate MLP Conditional Computation via Learned Structural Dropout and Input Dependent Gating for Functional Plasticity0
Encoding Predictability and Legibility for Style-Conditioned Diffusion Policy0
FederatedFactory: Generative One-Shot Learning for Extremely Non-IID Distributed Scenarios0
Prior-Informed Neural Network Initialization: A Spectral Approach for Function Parameterizing Architectures0
DermaFlux: Synthetic Skin Lesion Generation with Rectified Flows for Enhanced Image Classification0
PlotTwist: A Creative Plot Generation Framework with Small Language Models0
RECOVER: Robust Entity Correction via agentic Orchestration of hypothesis Variants for Evidence-based Recovery0
Trained Persistent Memory for Frozen Encoder--Decoder LLMs: Six Architectural Methods0
IndexRAG: Bridging Facts for Cross-Document Reasoning at Index Time0
Via Negativa for AI Alignment: Why Negative Constraints Are Structurally Superior to Positive Preferences0
SF-Mamba: Rethinking State Space Model for Vision0
An approximate graph elicits detonation lattice0
3D Fourier-based Global Feature Extraction for Hyperspectral Image Classification0
IRIS: A Real-World Benchmark for Inverse Recovery and Identification of Physical Dynamic Systems from Monocular Video0
Capability-Guided Compression: Toward Interpretability-Aware Budget Allocation for Large Language Models0
Visual Distraction Undermines Moral Reasoning in Vision-Language Models0
TinyGLASS: Real-Time Self-Supervised In-Sensor Anomaly Detection0
RetailBench: Evaluating Long-Horizon Autonomous Decision-Making and Strategy Stability of LLM Agents in Realistic Retail Environments0
Evo-Retriever: LLM-Guided Curriculum Evolution with Viewpoint-Pathway Collaboration for Multimodal Document Retrieval0
DynHD: Hallucination Detection for Diffusion Large Language Models via Denoising Dynamics Deviation Learning0
GAP-MLLM: Geometry-Aligned Pre-training for Activating 3D Spatial Perception in Multimodal Large Language Models0
DST-Net: A Dual-Stream Transformer with Illumination-Independent Feature Guidance and Multi-Scale Spatial Convolution for Low-Light Image Enhancement0
AdaMem: Adaptive User-Centric Memory for Long-Horizon Dialogue Agents0
Bridging the High-Frequency Data Gap: A Millisecond-Resolution Network Dataset for Advancing Time Series Foundation Models0
FEAT: A Linear-Complexity Foundation Model for Extremely Large Structured Data0
Exploring different approaches to customize language models for domain-specific text-to-code generation0
Show:102550
← PrevPage 93 of 13232Next →