SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

659,983 papers248,104 code links4,818 tasks

Papers

Showing 14011450 of 659983 papers

TitleStatusHype
Thermal is Always Wild: Characterizing and Addressing Challenges in Thermal-Only Novel View Synthesis0
Solver-Aided Verification of Policy Compliance in Tool-Augmented LLM Agents0
Policies Permitting LLM Use for Polishing Peer Reviews Are Currently Not Enforceable0
SDE-Driven Spatio-Temporal Hypergraph Neural Networks for Irregular Longitudinal fMRI Connectome Modeling in Alzheimer's Disease0
Reinforcement Learning from Multi-Source Imperfect Preferences: Best-of-Both-Regimes Regret0
From Data to Laws: Neural Discovery of Conservation Laws Without False Positives0
CREG: Compass Relational Evidence for Interpreting Spatial Reasoning in Vision-Language Models0
Profiling learners' affective engagement: Emotion AI, intercultural pragmatics, and language learning0
Spatio-Temporal Grid Intelligence: A Hybrid Graph Neural Network and LSTM Framework for Robust Electricity Theft Detection0
AE-LLM: Adaptive Efficiency Optimization for Large Language Models0
PARHAF, a human-authored corpus of clinical reports for fictitious patients in French0
Meeting in the Middle: A Co-Design Paradigm for FHE and AI Inference0
CogFormer: Learn All Your Models Once0
Delightful Distributed Policy Gradient0
Does This Gradient Spark Joy?0
RMNP: Row-Momentum Normalized Preconditioning for Scalable Matrix-Based Optimization0
Memory Over Maps: 3D Object Localization Without Reconstruction0
Epistemic Observability in Language Models0
When Negation Is a Geometry Problem in Vision-Language Models0
Permutation-Consensus Listwise Judging for Robust Factuality Evaluation0
ReBOL: Retrieval via Bayesian Optimization with Batched LLM Relevance Observations and Query Reformulation0
Evaluating Large Language Models on Historical Health Crisis Knowledge in Resource-Limited Settings: A Hybrid Multi-Metric Study0
Shift-Invariant Feature Attribution in the Application of Wireless Electrocardiograms0
Diffutron: A Masked Diffusion Language Model for Turkish Language0
Goal-oriented learning of stochastic dynamical systems using error bounds on path-space observables0
DiffGraph: An Automated Agent-driven Model Merging Framework for In-the-Wild Text-to-Image Generation0
End-to-End Optimization of Polarimetric Measurement and Material Classifier0
Efficient Counterfactual Reasoning in ProbLog via Single World Intervention Programs0
Distributed Gradient Clustering: Convergence and the Effect of Initialization0
Measuring Reasoning Trace Legibility: Can Those Who Understand Teach?0
Lessons and Open Questions from a Unified Study of Camera-Trap Species Recognition Over Time0
Grounded Chess Reasoning in Language Models via Master Distillation0
Revenue-Sharing as Infrastructure: A Distributed Business Model for Generative AI Platforms0
Towards Practical Multimodal Hospital Outbreak Detection0
LLM-Driven Heuristic Synthesis for Industrial Process Control: Lessons from Hot Steel Rolling0
Understanding Behavior Cloning with Action Quantization0
Benchmarking Efficient & Effective Camera Pose Estimation Strategies for Novel View Synthesis0
Forward and inverse problems for measure flows in Bayes Hilbert spaces0
Bounded Coupled AI Learning Dynamics in Tri-Hierarchical Drone Swarms0
Procedural Refinement by LLM-driven Algorithmic Debugging for ARC-AGI-20
Hybrid Autoencoder-Isolation Forest approach for time series anomaly detection in C70XP cyclotron operation data at ARRONAX0
ContractSkill: Repairable Contract-Based Skills for Multimodal Web Agents0
Interpretable Multiple Myeloma Prognosis with Observational Medical Outcomes Partnership Data0
The production of meaning in the processing of natural language0
Uni-Classifier: Leveraging Video Diffusion Priors for Universal Guidance Classifier0
Multi-Stage Fine-Tuning of Pathology Foundation Models with Head-Diverse Ensembling for White Blood Cell Classification0
Jigsaw Regularization in Whole-Slide Image Classification0
From Cross-Validation to SURE: Asymptotic Risk of Tuned Regularized Estimators0
A chemical language model for reticular materials design0
CAMA: Exploring Collusive Adversarial Attacks in c-MARL0
Show:102550
← PrevPage 29 of 13200Next →