SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 52015250 of 661570 papers

TitleStatusHype
AgentTrace: Causal Graph Tracing for Root Cause Analysis in Deployed Multi-Agent Systems0
Evaluating Causal Discovery Algorithms for Path-Specific Fairness and Utility in Healthcare0
The Importance of Being Smoothly Calibrated0
Automated Counting of Stacked Objects in Industrial Inspection0
Unbiased and Biased Variance-Reduced Forward-Reflected-Backward Splitting Methods for Stochastic Composite Inclusions0
Lite Any Stereo: Efficient Zero-Shot Stereo Matching0
daVinci-Env: Open SWE Environment Synthesis at Scale0
Intelligent Co-Design: An Interactive LLM Framework for Interior Spatial Design via Multi-Modal Agents0
Beyond Benchmark Islands: Toward Representative Trustworthiness Evaluation for Agentic AICode0
Geometric framework for biological evolution0
MobileLLM-Flash: Latency-Guided On-Device LLM Design for Industry Scale0
AGCD: Agent-Guided Cross-Modal Decoding for Weather Forecasting0
Understanding Reasoning in LLMs through Strategic Information Allocation under Uncertainty0
Learnability with Partial Labels and Adaptive Nearest Neighbors0
Semi-Autonomous Formalization of the Vlasov-Maxwell-Landau Equilibrium1
EndoCoT: Scaling Endogenous Chain-of-Thought Reasoning in Diffusion Models1
Revisiting Model Stitching In the Foundation Model Era0
Neural Value Iteration0
Self-Supervised ImageNet Representations for In Vivo Confocal Microscopy: Tortuosity Grading without Segmentation Maps0
Pretraining and Benchmarking Modern Encoders for Latvian0
Deep Reinforcement Learning for Fano Hypersurfaces0
BIS Reasoning 1.0: The First Large-Scale Japanese Benchmark for Belief-Inconsistent Syllogistic Reasoning0
Seismic full-waveform inversion based on a physics-driven generative adversarial network0
SRL-MAD: Structured Residual Latents for One-Class Morphing Attack Detection0
Establishing Construct Validity in LLM Capability Benchmarks Requires Nomological Networks0
Predictive Uncertainty in Short-Term PV Forecasting under Missing Data: A Multiple Imputation Approach0
Are LLMs Good Text Diacritizers? An Arabic and Yoruba Case Study0
T-FIX: Text-Based Explanations with Features Interpretable to eXperts0
Self Voice Conversion as an Attack against Neural Audio Watermarking0
CRASH: Cognitive Reasoning Agent for Safety Hazards in Autonomous Driving0
Generative Semantic HARQ: Latent-Space Text Retransmission and Combining0
Towards Foundation Models for Consensus Rank Aggregation0
Bridging National and International Legal Data: Two Projects Based on the Japanese Legal Standard XML Schema for Comparative Law Studies0
Coherent Audio-Visual Editing via Conditional Audio Generation Following Video Edits0
InterPol: De-anonymizing LM Arena via Interpolated Preference Learning0
DS^2-Instruct: Domain-Specific Data Synthesis for Large Language Models Instruction Tuning0
BayesBreak: Generalized Hierarchical Bayesian Segmentation with Irregular Designs, Multi-Sample Hierarchies, and Grouped/Latent-Group Designs0
CLRNet: Targetless Extrinsic Calibration for Camera, Lidar and 4D Radar Using Deep Learning0
Algorithmic Trading Strategy Development and Optimisation0
DAST: A Dual-Stream Voice Anonymization Attacker with Staged Training0
Fuz-RL: A Fuzzy-Guided Robust Framework for Safe Reinforcement Learning under Uncertainty0
Efficient Story Point Estimation With Comparative Learning0
LLM-Driven Instance-Specific Heuristic Generation and Selection0
Multiresolution Analysis and Statistical Thresholding on Dynamic Networks0
Convergence and clustering analysis for Mean Shift with radially symmetric, positive definite kernels0
WaRA: Wavelet Low Rank AdaptationCode0
Disentangled Feature Importance0
SpatialViz-Bench: A Cognitively-Grounded Benchmark for Diagnosing Spatial Visualization in MLLMs0
Data-Efficient ASR Personalization for Non-Normative Speech Using an Uncertainty-Based Phoneme Difficulty Score for Guided Sampling0
Chart-R1: Chain-of-Thought Supervision and Reinforcement for Advanced Chart Reasoner0
Show:102550
← PrevPage 105 of 13232Next →