SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

659,983 papers248,104 code links4,818 tasks

Papers

Showing 24012450 of 659983 papers

TitleStatusHype
How LLMs Distort Our Written Language0
Efficient Dense Crowd Trajectory Prediction Via Dynamic Clustering0
Enactor: From Traffic Simulators to Surrogate World Models0
Modeling the human lexicon under temperature variations: linguistic factors, diversity and typicality in LLM word associations0
Conflict-Free Policy Languages for Probabilistic ML Predicates: A Framework and Case Study with the Semantic Router DSL0
Starting Off on the Wrong Foot: Pitfalls in Data Preparation0
MicroVision: An Open Dataset and Benchmark Models for Detecting Vulnerable Road Users and Micromobility Vehicles0
Tackling the Sign Problem in the Doped Hubbard Model with Normalizing Flows0
Semantic Segmentation and Depth Estimation for Real-Time Lunar Surface Mapping Using 3D Gaussian Splatting0
A Hybrid Conditional Diffusion-DeepONet Framework for High-Fidelity Stress Prediction in Hyperelastic Materials0
Toward Reliable, Safe, and Secure LLMs for Scientific Applications0
Gradient-Informed Temporal Sampling Improves Rollout Accuracy in PDE Surrogate Training0
EDM-ARS: A Domain-Specific Multi-Agent System for Automated Educational Data Mining Research0
Detection Is Cheap, Routing Is Learned: Why Refusal-Based Alignment Evaluation Fails0
CycleCap: Improving VLMs Captioning Performance via Self-Supervised Cycle Consistency Fine-Tuning0
Offload or Overload: A Platform Measurement Study of Mobile Robotic Manipulation Workloads0
The Validity Gap in Health AI Evaluation: A Cross-Sectional Analysis of Benchmark Composition0
Sparse3DTrack: Monocular 3D Object Tracking Using Sparse Supervision0
Fast and Generalizable NeRF Architecture Selection for Satellite Scene Reconstruction0
Unrolled Reconstruction with Integrated Super-Resolution for Accelerated 3D LGE MRI0
Learning to Reason with Curriculum I: Provable Benefits of Autocurriculum0
Escaping Offline Pessimism: Vector-Field Reward Shaping for Safe Frontier Exploration0
Consumer-to-Clinical Language Shifts in Ambient AI Draft Notes and Clinician-Finalized Documentation: A Multi-level Analysis0
A Family of Adaptive Activation Functions for Mitigating Failure Modes in Physics-Informed Neural Networks0
FaithSteer-BENCH: A Deployment-Aligned Stress-Testing Benchmark for Inference-Time Steering0
MemArchitect: A Policy Driven Memory Governance Layer0
VISTA: Validation-Guided Integration of Spatial and Temporal Foundation Models with Anatomical Decoding for Rare-Pathology VCE Event Detection0
Understanding the Theoretical Foundations of Deep Neural Networks through Differential Equations0
Can LLMs Reason Like Automated Theorem Provers for Rust Verification? VCoT-Bench: Evaluating via Verification Chain of Thought0
Shifting Uncertainty to Critical Moments: Towards Reliable Uncertainty Quantification for VLA Model0
HRI-SA: A Multimodal Dataset for Online Assessment of Human Situational Awareness during Remote Human-Robot Teaming0
Epistemic Generative Adversarial Networks0
Large-Scale Analysis of Political Propaganda on Moltbook0
From Noise to Signal: When Outliers Seed New Topics0
Final Report for the Workshop on Robotics & AI in Medicine0
From Binary to Bilingual: How the National Weather Service is Using Artificial Intelligence to Develop a Comprehensive Translation Program0
CytoSyn: a Foundation Diffusion Model for Histopathology -- Tech Report0
AGRI-Fidelity: Evaluating the Reliability of Listenable Explanations for Poultry Disease Detection0
Privacy-Preserving Machine Learning for IoT: A Cross-Paradigm Survey and Future Roadmap0
LICA: Layered Image Composition Annotations for Graphic Design Research0
DarkDriving: A Real-World Day and Night Aligned Dataset for Autonomous Driving in the Dark Environment0
Transfer Learning for Contextual Joint Assortment-Pricing under Cross-Market Heterogeneity0
Intellectual Stewardship: Re-adapting Human Minds for Creative Knowledge Work in the Age of AI0
LGESynthNet: Controlled Scar Synthesis for Improved Scar Segmentation in Cardiac LGE-MRI Imaging0
Universal Skeleton Understanding via Differentiable Rendering and MLLMs0
A Structured Nonparametric Framework for Nonlinear Accelerated Failure Time Models (KAN-AFT)0
Constrained Hybrid Metaheuristic: A Universal Framework for Continuous Optimisation0
Rule-Based Explanations for Retrieval-Augmented LLM Systems0
LLM-Augmented Computational Phenotyping of Long Covid0
Multi-Trait Subspace Steering to Reveal the Dark Side of Human-AI Interaction0
Show:102550
← PrevPage 49 of 13200Next →