SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 1020110250 of 661570 papers

TitleStatusHype
Best-of-Tails: Bridging Optimism and Pessimism in Inference-Time Alignment0
Breaking the Martingale Curse: Multi-Agent Debate via Asymmetric Cognitive Potential Energy0
A Hybrid Machine Learning Model for Cerebral Palsy Detection0
"Dark Triad" Model Organisms of Misalignment: Narrow Fine-Tuning Mirrors Human Antisocial Behavior0
CREDO: Epistemic-Aware Conformalized Credal Envelopes for Regression0
Step-Level Visual Grounding Faithfulness Predicts Out-of-Distribution Generalization in Long-Horizon Vision-Language Models0
Joint MDPs and Reinforcement Learning in Coupled-Dynamics Environments0
Joint 3D Gravity and Magnetic Inversion via Rectified Flow and Ginzburg-Landau Guidance0
Validation of a Small Language Model for DSM-5 Substance Category Classification in Child Welfare Records0
MotionBits: Video Segmentation through Motion-Level Analysis of Rigid Bodies0
Active View Selection with Perturbed Gaussian Ensemble for Tomographic Reconstruction0
ColonSplat: Reconstruction of Peristaltic Motion in Colonoscopy with Dynamic Gaussian Splatting0
A prior information informed learning architecture for flying trajectory prediction0
Kernel Methods for Some Transport Equations with Application to Learning Kernels for the Approximation of Koopman Eigenfunctions: A Unified Approach via Variational Methods, Green's Functions and the Method of Characteristics0
LieCraft: A Multi-Agent Framework for Evaluating Deceptive Capabilities in Language Models0
Not Too Short, Not Too Long: How LLM Response Length Shapes People's Critical Thinking in Error Detection0
Physics-informed AI Accelerated Retention Analysis of Ferroelectric Vertical NAND: From Day-Scale TCAD to Second-Scale Surrogate Model0
Distributed Legal Infrastructure for a Trustworthy Agentic Web0
OPTED: Open Preprocessed Trachoma Eye Dataset Using Zero-Shot SAM 3 Segmentation0
Enhancing the Detection of Coronary Artery Disease Using Machine Learning0
Learning From Design Procedure To Generate CAD Programs for Data Augmentation0
Fairness May Backfire: When Leveling-Down Occurs in Fair Machine Learning0
Empowering Locally Deployable Medical Agent via State Enhanced Logical Skills for FHIR-based Clinical Tasks0
XGenBoost: Synthesizing Small and Large Tabular Datasets with XGBoost0
MedInjection-FR: Exploring the Role of Native, Synthetic, and Translated Data in Biomedical Instruction Tuning0
PaQ-DETR: Learning Pattern and Quality-Aware Dynamic Queries for Object Detection0
Small Target Detection Based on Mask-Enhanced Attention Fusion of Visible and Infrared Remote Sensing Images0
HIERAMP: Coarse-to-Fine Autoregressive Amplification for Generative Dataset Distillation0
Swimba: Switch Mamba Model Scales State Space Models0
Physics-Consistent Neural Networks for Learning Deformation and Director Fields in Microstructured Media with Loss-Based Validation Criteria0
Deep Research, Shallow Evaluation: A Case Study in Meta-Evaluation for Long-Form QA Benchmarks0
Position: Stop Anthropomorphizing Intermediate Tokens as Reasoning/Thinking Traces!0
Agent Hunt: Bounty Based Collaborative Autoformalization With LLM Agents0
Toward Generative Quantum Utility via Correlation-Complexity Map0
Slurry-as-a-Service: A Modest Proposal on Scalable Pluralistic Alignment for Nutrient Optimization0
ODD-SEC: Onboard Drone Detection with a Spinning Event Camera0
The Limits of Long-Context Reasoning in Automated Bug Fixing0
Predictive Coding Graphs are a Superset of Feedforward Neural Networks0
IGLU: The Integrated Gaussian Linear Unit Activation Function0
Kinetic-based regularization: Learning spatial derivatives and PDE applications0
An Extended Topological Model For High-Contrast Optical Flow0
Both Ends Count! Just How Good are LLM Agents at "Text-to-Big SQL"?0
Denoising as Path Planning: Training-Free Acceleration of Diffusion Models with DPCacheCode0
TADPO: Reinforcement Learning Goes Off-road0
Lyapunov Probes for Hallucination Detection in Large Foundation Models0
GazeMoE: Perception of Gaze Target with Mixture-of-Experts0
How Professional Visual Artists are Negotiating Generative AI in the Workplace0
FireScope: Wildfire Risk Prediction with a Chain-of-Thought Oracle0
Simulating Meaning, Nevermore! Introducing ICR: A Semiotic-Hermeneutic Metric for Evaluating Meaning in LLM Text Summaries0
Characterizing Evolution in Expectation-Maximization Estimates for Overspecified Mixed Linear Regression0
Show:102550
← PrevPage 205 of 13232Next →