SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

659,983 papers248,104 code links4,818 tasks

Papers

Showing 476500 of 659983 papers

TitleStatusHype
Can LLM Agents Be CFOs? A Benchmark for Resource Allocation in Dynamic Enterprise Environments0
Flying Pigs, FaR and Beyond: Evaluating LLM Reasoning in Counterfactual Worlds0
PRISM: Video Dataset Condensation with Progressive Refinement and Insertion for Sparse Motion0
Decorrelation, Diversity, and Emergent Intelligence: The Isomorphism Between Social Insect Colonies and Ensemble Machine Learning0
Inverting Neural Networks: New Methods to Generate Neural Network Inputs from Prescribed Outputs0
When Models Judge Themselves: Unsupervised Self-Evolution for Multimodal Reasoning0
Test-Time Adaptation via Cache Personalization for Facial Expression Recognition in Videos0
TimeTox: An LLM-Based Pipeline for Automated Extraction of Time Toxicity from Clinical Trial Protocols0
A transformer architecture alteration to incentivise externalised reasoning0
Bounding Box Anomaly Scoring for simple and efficient Out-of-Distribution detection0
Improving LLM Predictions via Inter-Layer Structural Encoders0
Vision-based Deep Learning Analysis of Unordered Biomedical Tabular Datasets via Optimal Spatial Cartography0
MuQ-Eval: An Open-Source Per-Sample Quality Metric for AI Music Generation Evaluation0
Voice Privacy from an Attribute-based Perspective0
PopResume: Causal Fairness Evaluation of LLM/VLM Resume Screeners with Population-Representative Dataset0
SOUPLE: Enhancing Audio-Visual Localization and Segmentation with Learnable Prompt Contexts0
Exposure-Normalized Bed and Chair Fall Rates via Continuous AI Monitoring0
Conditionally Identifiable Latent Representation for Multivariate Time Series with Structural Dynamics0
Stepwise Variational Inference with Vine Copulas0
Asymptotic Learning Curves for Diffusion Models with Random Features Score and Manifold Data0
A Critical Review on the Effectiveness and Privacy Threats of Membership Inference Attacks0
Robustness Quantification and Uncertainty Quantification: Comparing Two Methods for Assessing the Reliability of Classifier Predictions0
VLA-IAP: Training-Free Visual Token Pruning via Interaction Alignment for Vision-Language-Action Models0
Minibal: Balanced Game-Playing Without Opponent Modeling0
Efficient Benchmarking of AI Agents0
Show:102550
← PrevPage 20 of 26400Next →