SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

659,983 papers248,104 code links4,818 tasks

Papers

Showing 24512500 of 659983 papers

TitleStatusHype
Stable Deep Reinforcement Learning via Isotropic Gaussian Representations0
EchoGen: Cycle-Consistent Learning for Unified Layout-Image Generation and Understanding0
Loc3R-VLM: Language-based Localization and 3D Reasoning with Vision-Language Models0
Universal Sparse Autoencoders: Interpretable Cross-Model Concept Alignment0
Surrogate Model for Heat Transfer Prediction in Impinging Jet Arrays using Dynamic Inlet/Outlet and Flow Rate Control0
Entity-Specific Cyber Risk Assessment using InsurTech Empowered Risk Factors0
MMSearch-Plus: Benchmarking Provenance-Aware Search for Multimodal Browsing Agents0
Enhancing Reinforcement Learning Fine-Tuning with an Online Refiner0
Universal Inverse Distillation for Matching Models with Real-Data Supervision (No GANs)Code0
Simulation to Rules: A Dual-VLM Framework for Formal Visual Planning0
An Improved Model-Free Decision-Estimation Coefficient with Applications in Adversarial MDPs0
Bridging Earth and Space: A Survey on HAPS for Non-Terrestrial Networks0
Seeing Beyond the Image: ECG and Anatomical Knowledge-Guided Myocardial Scar Segmentation from Late Gadolinium-Enhanced Images0
DuoTeach: Dual Role Self-Teaching for Coarse-to-Fine Decision Coordination in Vision--Language Models0
Embedding Physical Reasoning into Diffusion-Based Shadow Generation0
GriDiT: Factorized Grid-Based Diffusion for Efficient Long Image Sequence Generation0
SF-RAG: Structure-Fidelity Retrieval-Augmented Generation for Academic Question Answering0
Causality is Key for Interpretability Claims to Generalise0
Thousand-GPU Large-Scale Training and Optimization Recipe for AI-Native Cloud Embodied Intelligence Infrastructure0
Systematic Scaling Analysis of Jailbreak Attacks in Large Language Models0
MedMASLab: A Unified Orchestration Framework for Benchmarking Multimodal Medical Multi-Agent SystemsCode0
SoulX-LiveAct: Towards Hour-Scale Real-Time Human Animation with Neighbor Forcing and ConvKV Memory0
A Stability-Aware Frozen Euler Autoencoder for Physics-Informed Tracking in Continuum Mechanics (SAFE-PIT-CM)0
Spectral Edge Dynamics of Training Trajectories: Signal--Noise Geometry Across Scales0
AsgardBench -- Evaluating Visually Grounded Interactive Planning Under Minimal Feedback0
Generative Replica-Exchange: A Flow-based Framework for Accelerating Replica Exchange Simulations0
Q-Drift: Quantization-Aware Drift Correction for Diffusion Model Sampling0
STEP: Detecting Audio Backdoor Attacks via Stability-based Trigger Exposure Profiling0
Adaptive Domain Models: Bayesian Evolution, Warm Rotation, and Principled Training for Geometric and Neuromorphic AI0
Understanding Task Aggregation for Generalizable Ultrasound Foundation Models0
Learning-Augmented Algorithms for k-median via Online Learning0
ResNets of All Shapes and Sizes: Convergence of Training Dynamics in the Large-scale Limit0
VLM-AutoDrive: Post-Training Vision-Language Models for Safety-Critical Autonomous Driving Events0
Retrieval-Augmented LLMs for Security Incident Analysis0
Retrieval-Augmented LLM Agents: Learning to Learn from Experience0
A Computationally Efficient Learning of Artificial Intelligence System Reliability Considering Error Propagation0
MolRGen: A Training and Evaluation Setting for De Novo Molecular Generation with Reasonning Models0
CORE: Robust Out-of-Distribution Detection via Confidence and Orthogonal Residual Scoring0
ALIGN: Adversarial Learning for Generalizable Speech Neuroprosthesis0
Interpretability without actionability: mechanistic methods cannot correct language model errors despite near-perfect internal representations0
Synthetic Data Generation for Training Diversified Commonsense Reasoning Models0
Search2Motion: Training-Free Object-Level Motion Control via Attention-Consensus Search0
Fundamental Limits of Neural Network Sparsification: Evidence from Catastrophic Interpretability Collapse0
When Validation Fails: Cross-Institutional Blood Pressure Prediction and the Limits of Electronic Health Record-Based ModelsCode0
MEMO: Memory-Augmented Model Context Optimization for Robust Multi-Turn Multi-Agent LLM GamesCode0
Auditing Preferences for Brands and Cultures in LLMs0
Access Controlled Website Interaction for Agentic AI with Delegated Critical Tasks0
CeRA: Breaking the Linear Ceiling of Low-Rank Adaptation via Manifold Expansion0
Continually self-improving AI0
Impact of automatic speech recognition quality on Alzheimer's disease detection from spontaneous speech: a reproducible benchmark study with lexical modeling and statistical validation0
Show:102550
← PrevPage 50 of 13200Next →