SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 85018550 of 661570 papers

TitleStatusHype
Bayesian Hierarchical Models and the Maximum Entropy Principle0
Decoupling Reasoning and Confidence: Resurrecting Calibration in Reinforcement Learning from Verifiable Rewards0
ES-dLLM: Efficient Inference for Diffusion Large Language Models by Early-Skipping0
Mashup Learning: Faster Finetuning by Remixing Past Checkpoints0
Pretraining Frame Preservation for Lightweight Autoregressive Video History Embedding0
Fairness-Aware Fine-Tuning of Vision-Language Models for Medical Glaucoma Diagnosis0
CyberThreat-Eval: Can Large Language Models Automate Real-World Threat Research?Code0
ALARM: Audio-Language Alignment for Reasoning Models0
Improved Robustness of Deep Reinforcement Learning for Control of Time-Varying Systems by Bounded Extremum Seeking0
The Affine Divergence: Aligning Activation Updates Beyond Normalisation0
PostTrainBench: Can LLM Agents Automate LLM Post-Training?0
LLM as a Meta-Judge: Synthetic Data for NLP Evaluation Metric Validation0
Towards Understanding Adam Convergence on Highly Degenerate Polynomials0
Do What I Say: A Spoken Prompt Dataset for Instruction-Following0
Leveraging whole slide difficulty in Multiple Instance Learning to improve prostate cancer grading0
Pathwise Test-Time Correction for Autoregressive Long Video Generation0
Discovery of a Hematopoietic Manifold in scGPT Yields a Method for Extracting Performant Algorithms from Biological Foundation Model Internals0
Physics-Informed Neural Engine Sound Modeling with Differentiable Pulse-Train Synthesis0
CLIPO: Contrastive Learning in Policy Optimization Generalizes RLVRCode0
Long Chain-of-Thought Compression via Fine-Grained Group Policy OptimizationCode0
Directional Textual Inversion for Personalized Text-to-Image GenerationCode0
Latent Equivariant Operators for Robust Object Recognition: Promises and ChallengesCode0
Breaking the Factorization Barrier in Diffusion Language ModelsCode0
SlowBA: An efficiency backdoor attack towards VLM-based GUI agentsCode0
PromptDLA: A Domain-aware Prompt Document Layout Analysis Framework with Descriptive Knowledge as a CueCode0
CIGPose: Causal Intervention Graph Neural Network for Whole-Body Pose EstimationCode0
ActiveUltraFeedback: Efficient Preference Data Generation using Active LearningCode0
Test-time Ego-Exo-centric Adaptation for Action Anticipation via Multi-Label Prototype Growing and Dual-Clue ConsistencyCode0
From Data Statistics to Feature Geometry: How Correlations Shape SuperpositionCode0
No Memorization, No Detection: Output Distribution-Based Contamination Detection in Small Language ModelsCode0
SGI: Structured 2D Gaussians for Efficient and Compact Large Image RepresentationCode0
Rethinking Adam for Time Series Forecasting: A Simple Heuristic to Improve Optimization under Distribution ShiftsCode0
Robotic Ultrasound Makes CBCT AliveCode0
More than the Sum: Panorama-Language Models for Adverse Omni-ScenesCode0
DT-BEHRT: Disease Trajectory-aware Transformer for Interpretable Patient Representation LearningCode0
Making Training-Free Diffusion Segmentors Scale with the Generative PowerCode0
Compiler-First State Space Duality and Portable O(1) Autoregressive Caching for InferenceCode0
BinaryAttention: One-Bit QK-Attention for Vision and Diffusion TransformersCode0
MM-algorithms for traditional and convex NMF with Tweedie and Negative Binomial cost functions and empirical evaluationCode0
IndiMathBench: Autoformalizing Mathematical Reasoning Problems with a Human TouchCode0
HTMuon: Improving Muon via Heavy-Tailed Spectral CorrectionCode0
KernelSkill: A Multi-Agent Framework for GPU Kernel OptimizationCode0
A Survey of Weight Space Learning: Understanding, Representation, and GenerationCode0
HG-Lane: High-Fidelity Generation of Lane Scenes under Adverse Weather and Lighting Conditions without Re-annotationCode0
OilSAM2: Memory-Augmented SAM2 for Scalable SAR Oil Spill DetectionCode0
PanoAffordanceNet: Towards Holistic Affordance Grounding in 360° Indoor EnvironmentsCode0
Multimodal Classification via Total Correlation MaximizationCode0
Video-Based Reward Modeling for Computer-Use Agents1
InternVL-U: Democratizing Unified Multimodal Models for Understanding, Reasoning, Generation and Editing3
TinyNav: End-to-End TinyML for Real-Time Autonomous Navigation on Microcontrollers1
Show:102550
← PrevPage 171 of 13232Next →