SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 84018450 of 661570 papers

TitleStatusHype
Dissecting Chronos: Sparse Autoencoders Reveal Causal Feature Hierarchies in Time Series Foundation Models0
ADVERSA: Measuring Multi-Turn Guardrail Degradation and Judge Reliability in Large Language Models0
TASER: Task-Aware Spectral Energy Refine for Backdoor Suppression in UAV Swarms Decentralized Federated Learning0
Stochastic Port-Hamiltonian Neural Networks: Universal Approximation with Passivity Guarantees0
Mitigating Frequency Learning Bias in Quantum Models via Multi-Stage Residual Learning0
Multi-Stream Perturbation Attack: Breaking Safety Alignment of Thinking LLMs Through Concurrent Task Interference0
Code-Space Response Oracles: Generating Interpretable Multi-Agent Policies with Large Language Models0
Lost in the Middle at Birth: An Exact Theory of Transformer Position Bias0
A Trust-Region Interior-Point Stochastic Sequential Quadratic Programming Method0
4DEquine: Disentangling Motion and Appearance for 4D Equine Reconstruction from Monocular Video0
AR-VLA: True Autoregressive Action Expert for Vision-Language-Action Models0
The Prediction-Measurement Gap: Toward Meaning Representations as Scientific Instruments0
Reason and Verify: A Framework for Faithful Retrieval-Augmented Generation0
ReMix: Reinforcement routing for mixtures of LoRAs in LLM finetuning0
Compatibility at a Cost: Systematic Discovery and Exploitation of MCP Clause-Compliance Vulnerabilities0
Stability and Robustness via Regularization: Bandit Inference via Regularized Stochastic Mirror Descent0
Flexible Cutoff Learning: Optimizing Machine Learning Potentials After Training0
ARCHE: Autoregressive Residual Compression with Hyperprior and Excitation0
MCP-in-SoS: Risk assessment framework for open-source MCP servers0
Adaptive Activation Cancellation for Hallucination Mitigation in Large Language Models0
Actor-Accelerated Policy Dual Averaging for Reinforcement Learning in Continuous Action Spaces0
Taming Score-Based Denoisers in ADMM: A Convergent Plug-and-Play Framework0
ViDia2Std: A Parallel Corpus and Methods for Low-Resource Vietnamese Dialect-to-Standard Translation0
Sabiá-4 Technical Report0
SDSR: A Spectral Divide-and-Conquer Approach for Species Tree Reconstruction0
An Automated Radiomics Framework for Postoperative Survival Prediction in Colorectal Liver Metastases using Preoperative MRI0
A Diffusion Analysis of Policy Gradient for Stochastic Bandits0
Why Does It Look There? Structured Explanations for Image Classification0
S-GRADES -- Studying Generalization of Student Response Assessments in Diverse Evaluative Settings0
One Adapter for All: Towards Unified Representation in Step-Imbalanced Class-Incremental Learning0
SiMPO: Measure Matching for Online Diffusion Reinforcement Learning0
Joint Imaging-ROI Representation Learning via Cross-View Contrastive Alignment for Brain Disorder Classification0
Improving TabPFN's Synthetic Data Generation by Integrating Causal Structure0
A Robust Deep Learning Framework for Bangla License Plate Recognition Using YOLO and Vision-Language OCR0
Robust Post-Training for Generative Recommenders: Why Exponential Reward-Weighted SFT Outperforms RLHF0
GSVD for Geometry-Grounded Dataset Comparison: An Alignment Angle Is All You Need0
VCR: Variance-Driven Channel Recalibration for Robust Low-Light Enhancement0
From Phase Prediction to Phase Design: A ReAct Agent Framework for High-Entropy Alloy Discovery0
CR-Bench: Evaluating the Real-World Utility of AI Code Review Agents0
OA-NBV: Occlusion-Aware Next-Best-View Planning for Human-Centered Active Perception on Mobile Robots0
Unifying Logical and Physical Layout Representations via Heterogeneous Graphs for Circuit Congestion Prediction0
DIVE: Scaling Diversity in Agentic Task Synthesis for Generalizable Tool Use0
Quality-Driven Agentic Reasoning for LLM-Assisted Software Design: Questions-of-Thoughts (QoT) as a Time-Series Self-QA Chain0
The Confidence Gate Theorem: When Should Ranked Decision Systems Abstain?0
CLEAR-Mamba:Towards Accurate, Adaptive and Trustworthy Multi-Sequence Ophthalmic Angiography ClassificationCode0
Exploring Single Domain Generalization of LiDAR-based Semantic Segmentation under Imperfect Labels0
M^2-Occ: Resilient 3D Semantic Occupancy Prediction for Autonomous Driving with Incomplete Camera InputsCode0
An Interpretable Operator-Learning Model for Electric Field Profile Reconstruction in Discharges Based on the EFISH Method0
Enhancing Retrieval-Augmented Generation with Entity Linking for Educational Platforms0
Hybrid Hidden Markov Model for Modeling Equity Excess Growth Rate Dynamics: A Discrete-State Approach with Jump-Diffusion0
Show:102550
← PrevPage 169 of 13232Next →