SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 36763700 of 661570 papers

TitleStatusHype
Evaluating Counterfactual Strategic Reasoning in Large Language Models0
AIMER: Calibration-Free Task-Agnostic MoE Pruning0
Remove360: Benchmarking Residuals After Object Removal in 3D Gaussian Splatting0
LLM-Augmented Changepoint Detection: A Framework for Ensemble Detection and Automated Explanation0
BVSIMC: Bayesian Variable Selection-Guided Inductive Matrix Completion for Improved and Interpretable Drug Discovery0
HypeMed: Enhancing Medication Recommendations with Hypergraph-Based Patient Relationships0
Interpretable Prostate Cancer Detection using a Small Cohort of MRI Images0
NeuroGame Transformer: Gibbs-Inspired Attention Driven by Game Theory and Statistical PhysicsCode0
Implicit Grading Bias in Large Language Models: How Writing Style Affects Automated Assessment Across Math, Programming, and Essay Tasks0
Progressive Training for Explainable Citation-Grounded Dialogue: Reducing Hallucination to Zero in English-Hindi LLMs0
DaPT: A Dual-Path Framework for Multilingual Multi-hop Question Answering0
GSMem: 3D Gaussian Splatting as Persistent Spatial Memory for Zero-Shot Embodied Exploration and Reasoning0
Meanings and Measurements: Multi-Agent Probabilistic Grounding for Vision-Language Navigation0
OS-Themis: A Scalable Critic Framework for Generalist GUI Rewards0
Evaluating Game Difficulty in Tetris Block Puzzle0
On Optimizing Multimodal Jailbreaks for Spoken Language Models0
Words at Play: Benchmarking Audio Pun Understanding in Large Audio-Language Models0
DSPO: Stable and Efficient Policy Optimization for Agentic Search and Reasoning0
DriveSplat: Unified Neural Gaussian Reconstruction for Dynamic Driving Scenes0
A Unified Generalization Framework for Model Merging: Trade-offs, Non-Linearity, and Scaling Laws0
Is Hierarchical Quantization Essential for Optimal Reconstruction?0
RE-SAC: Disentangling aleatoric and epistemic risks in bus fleet control: A stable and robust ensemble DRL approach0
Discounted Beta--Bernoulli Reward Estimation for Sample-Efficient Reinforcement Learning with Verifiable Rewards0
GAPSL: A Gradient-Aligned Parallel Split Learning on Heterogeneous Data0
Transformers Learn Robust In-Context Regression under Distributional Uncertainty0
Show:102550
← PrevPage 148 of 26463Next →