SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 94019450 of 661570 papers

TitleStatusHype
Lindbladian Learning with Neural Differential Equations0
Vision Transformers that Never Stop Learning0
Scaling Data Difficulty: Improving Coding Models via Reinforcement Learning on Fresh and Challenging Problems0
ProgAgent:A Continual RL Agent with Progress-Aware Rewards0
OrdinalBench: A Benchmark Dataset for Diagnosing Generalization Limits in Ordinal Number Understanding of Vision-Language Models0
Neural Precoding in Complex Projective Spaces0
HybridStitch: Pixel and Timestep Level Model Stitching for Diffusion Acceleration0
Tracking Phenological Status and Ecological Interactions in a Hawaiian Cloud Forest Understory using Low-Cost Camera Traps and Visual Foundation Models0
An Efficient and Effective Evaluator for Text2SQL Models on Unseen and Unlabeled DataCode0
Column Generation for the Micro-Transit Zoning Problem0
Gradient Iterated Temporal-Difference Learning0
GazeShift: Unsupervised Gaze Estimation and Dataset for VRCode0
AI Misuse in Education Is a Measurement Problem: Toward a Learning Visibility Framework0
DistillGuard: Evaluating Defenses Against LLM Knowledge Distillation0
Training-free Temporal Object Tracking in Surgical Videos0
Intentional Deception as Controllable Capability in LLM Agents0
Generalized Reduction to the Isotropy for Flexible Equivariant Neural Fields0
EDMFormer: Genre-Specific Self-Supervised Learning for Music Structure Segmentation0
On the Formal Limits of Alignment Verification0
Clear, Compelling Arguments: Rethinking the Foundations of Frontier AI Safety Cases0
Benchmarking Large Language Models for Quebec Insurance: From Closed-Book to Retrieval-Augmented Generation0
MathSmith: Towards Extremely Hard Mathematical Reasoning by Forging Synthetic Problems with a Reinforced PolicyCode0
Hide and Find: A Distributed Adversarial Attack on Federated Graph Learning0
Beyond Surrogates: A Quantitative Analysis for Inter-Metric Relationships0
Dual-Metric Evaluation of Social Bias in Large Language Models: Evidence from an Underrepresented Nepali Cultural Context0
Why Adam Can Beat SGD: Second-Moment Normalization Yields Sharper Tails0
Global Convergence of Average Reward Constrained MDPs with Neural Critic and General Policy Parameterization0
Toward Global Intent Inference for Human Motion by Inverse Reinforcement Learning0
Deliberative Dynamics and Value Alignment in LLM Debates0
Rigidity in LLM Bandits with Implications for Human-AI Dyads0
Step-Size Decay and Structural Stagnation in Greedy Sparse Learning0
AI-Driven Phase Identification from X-ray Hyperspectral Imaging of cycled Na-ion Cathode Materials0
Learning embeddings of non-linear PDEs: the Burgers' equation0
Skip to the Good Part: Representation Structure & Inference-Time Layer Skipping in Diffusion vs. Autoregressive LLMs0
Goal Alignment in LLM-Based User Simulators for Conversational AI0
ARM-FM: Automated Reward Machines via Foundation Models for Compositional Reinforcement Learning0
Model-Free Neural State Estimation in Nonlinear Dynamical Systems: Comparing Neural and Classical Filters0
Bitcoin Price Prediction using Machine Learning and Combinatorial Fusion Analysis0
Transferable Optimization Network for Cross-Domain Image Reconstruction0
DropVLA: An Action-Level Backdoor Attack on Vision-Language-Action Models0
UniUncer: Unified Dynamic Static Uncertainty for End to End Driving0
FusionRegister: Every Infrared and Visible Image Fusion Deserves RegistrationCode0
Compressed-Domain-Aware Online Video Super-ResolutionCode0
MWM: Mobile World Models for Action-Conditioned Consistent PredictionCode0
Flow Matching Meets Biology and Life Science: A SurveyCode0
Reverse Distillation: Consistently Scaling Protein Language Model RepresentationsCode0
Learning Context-Adaptive Motion Priors for Masked Motion Diffusion Models with Efficient Kinematic Attention AggregationCode0
TDM-R1: Reinforcing Few-Step Diffusion Models with Non-Differentiable RewardCode0
AI Steerability 360: A Toolkit for Steering Large Language ModelsCode0
ArcLight: A Lightweight LLM Inference Architecture for Many-Core CPUsCode0
Show:102550
← PrevPage 189 of 13232Next →