SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 1110111150 of 661570 papers

TitleStatusHype
Bayesian Modeling of Collatz Stopping Times: A Probabilistic Machine Learning Perspective0
Retrieval or Representation? Reassessing Benchmark Gaps in Multilingual and Visually Rich RAG0
When AI Fails, What Works? A Data-Driven Taxonomy of Real-World AI Risk Mitigation Strategies0
CareMedEval dataset: Evaluating Critical Appraisal and Reasoning in the Biomedical Field0
Why Do Neural Networks Forget: A Study of Collapse in Continual Learning0
IoUCert: Robustness Verification for Anchor-based Object Detectors0
A Fast Generative Framework for High-dimensional Posterior Sampling: Application to CMB Delensing0
ECG-MoE: Mixture-of-Expert Electrocardiogram Foundation Model0
Inference-time optimization for experiment-grounded protein ensemble generation0
sFRC for assessing hallucinations in medical image restoration0
Auto-Adaptive PINNs with Applications to Phase Transitions0
Balancing Fidelity, Utility, and Privacy in Synthetic Cardiac MRI Generation: A Comparative Study0
Low-Resource Guidance for Controllable Latent Audio Diffusion0
The Company You Keep: How LLMs Respond to Dark Triad Traits0
CRESTomics: Analyzing Carotid Plaques in the CREST-2 Trial with a New Additive Classification Model0
TaxonRL: Reinforcement Learning with Intermediate Rewards for Interpretable Fine-Grained Visual Reasoning0
Query-Level Uncertainty in Large Language Models0
UMA: A Family of Universal Models for Atoms0
Fast Equivariant Imaging: Acceleration for Unsupervised Learning via Augmented Lagrangian and Auxiliary PnP Denoisers0
Evaluating Text Style Transfer: A Nine-Language Benchmark for Text Detoxification0
ERDES: A Benchmark Video Dataset for Retinal Detachment and Macular Status Classification in Ocular Ultrasound0
On the Generalization Limits of Quantum Generative Adversarial Networks with Pure State Generators0
Trust Me, I Can Convince You: The Contextualized Argument Appraisal Framework0
Benchmarking MLLM-based Web Understanding: Reasoning, Robustness and Safety0
Benchmarking ECG FMs: A Reality Check Across Clinical Tasks0
Circuit Insights: Towards Interpretability Beyond Activations0
ELMUR: External Layer Memory with Update/Rewrite for Long-Horizon RL Problems1
Composition-Grounded Data Synthesis for Visual Reasoning0
MuSaG: A Multimodal German Sarcasm Dataset with Full-Modal Annotations0
SpotIt: Evaluating Text-to-SQL Evaluation with Formal Verification0
Cognition Envelopes for Bounded Decision Making in Autonomous UAS Operations0
Beyond Mixtures and Products for Ensemble Aggregation: A Likelihood Perspective on Generalized Means0
SycoEval-EM: Sycophancy Evaluation of Large Language Models in Simulated Clinical Encounters for Emergency Care0
NRR-Phi: Text-to-State Mapping for Ambiguity Preservation in LLM Inference0
LUMINA: Foundation Models for Topology Transferable ACOPF0
Skullptor: High Fidelity 3D Head Reconstruction in Seconds with Multi-View Normal Prediction0
CMI-RewardBench: Evaluating Music Reward Models with Compositional Multimodal Instruction0
Motion Manipulation via Unsupervised Keypoint Positioning in Face Animation0
Real5-OmniDocBench: A Full-Scale Physical Reconstruction Benchmark for Robust Document Parsing in the Wild0
Beyond Edge Deletion: A Comprehensive Approach to Counterfactual Explanation in Graph Neural Networks0
PRAM-R: A Perception-Reasoning-Action-Memory Framework with LLM-Guided Modality Routing for Adaptive Autonomous Driving0
Semi-Supervised Generative Learning via Latent Space Distribution Matching0
DiverseDiT: Towards Diverse Representation Learning in Diffusion Transformers0
Agentics 2.0: Logical Transduction Algebra for Agentic Data Workflows0
FeedAIde: Guiding App Users to Submit Rich Feedback Reports by Asking Context-Aware Follow-Up Questions0
LikeThis! Empowering App Users to Submit UI Improvement Suggestions Instead of Complaints0
Online Learning for Multi-Layer Hierarchical Inference under Partial and Policy-Dependent Feedback0
Memex(RL): Scaling Long-Horizon LLM Agents via Indexed Experience Memory0
ViterbiPlanNet: Injecting Procedural Knowledge via Differentiable Viterbi for Planning in Instructional Videos0
SSR: A Generic Framework for Text-Aided Map Compression for Localization0
Show:102550
← PrevPage 223 of 13232Next →