SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

658,356 papers258,391 code links4,818 tasks

Papers

Showing 251300 of 658356 papers

TitleStatusHype
Agentic Harness for Real-World CompilersCode0
The Y-Combinator for LLMs: Solving Long-Context Rot with λ-CalculusCode0
ReLi3D: Relightable Multi-view 3D Reconstruction with Disentangled Illumination1
Continual Learning for Food Category Classification Dataset: Enhancing Model Adaptability and Performance0
AIGQ: An End-to-End Hybrid Generative Architecture for E-commerce Query Recommendation0
RAM: Recover Any 3D Human Motion in-the-Wild0
NEC-Diff: Noise-Robust Event-RAW Complementary Diffusion for Seeing Motion in Extreme DarknessCode0
ODySSeI: An Open-Source End-to-End Framework for Automated Detection, Segmentation, and Severity Estimation of Lesions in Invasive Coronary Angiography Images0
Measuring Faithfulness Depends on How You Measure: Classifier Sensitivity in LLM Chain-of-Thought Evaluation0
DAPA: Distribution Aware Piecewise Activation Functions for On-Device Transformer Inference and Training0
TuLaBM: Tumor-Biased Latent Bridge Matching for Contrast-Enhanced MRI Synthesis0
Pseudo-Labeling for Unsupervised Domain Adaptation with Kernel GLMs0
VeloxNet: Efficient Spatial Gating for Lightweight Embedded Image Classification0
Depictions of Depression in Generative AI Video Models: A Preliminary Study of OpenAI's Sora 20
dinov3.seg: Open-Vocabulary Semantic Segmentation with DINOv30
Predicting Hidden Links and Missing Nodes in Scale-Free Networks with Artificial Neural Networks0
POET: Power-Oriented Evolutionary Tuning for LLM-Based RTL PPA Optimization0
Do Post-Training Algorithms Actually Differ? A Controlled Study Across Model Scales Uncovers Scale-Dependent Ranking Inversions0
Diffusion-Guided Semantic Consistency for Multimodal Heterogeneity0
Spectral Tempering for Embedding Compression in Dense Passage Retrieval0
Beyond Weighted Summation: Learnable Nonlinear Aggregation Functions for Robust Artificial Neurons0
Exploring the Agentic Frontier of Verilog Code Generation0
Anatomical Heterogeneity in Transformer Language Models0
A Mathematical Theory of Understanding0
A Novel Solution for Zero-Day Attack Detection in IDS using Self-Attention and Jensen-Shannon Divergence in WGAN-GP0
Warm-Start Flow Matching for Guaranteed Fast Text/Image Generation0
Factored Levenberg-Marquardt for Diffeomorphic Image Registration: An efficient optimizer for FireANTs0
Automated Membership Inference Attacks: Discovering MIA Signal Computations using LLM Agents0
Bridging Conformal Prediction and Scenario Optimization: Discarded Constraints and Modular Risk Allocation0
Optimizing Resource-Constrained Non-Pharmaceutical Interventions for Multi-Cluster Outbreak Control Using Hierarchical Reinforcement Learning0
Scalable Prompt Routing via Fine-Grained Latent Task Discovery0
Investigating In-Context Privacy Learning by Integrating User-Facing Privacy Tools into Conversational Agents0
The Autonomy Tax: Defense Training Breaks LLM Agents0
Is Evaluation Awareness Just Format Sensitivity? Limitations of Probe-Based Evidence under Controlled Prompt Structure0
Vocabulary shapes cross-lingual variation of word-order learnability in language models0
When both Grounding and not Grounding are Bad -- A Partially Grounded Encoding of Planning into SAT (Extended Version)0
Subspace Projection Methods for Fast Spectral Embeddings of Evolving Graphs0
Near-Equivalent Q-learning Policies for Dynamic Treatment Regimes0
LoFi: Location-Aware Fine-Grained Representation Learning for Chest X-ray0
TrustFlow: Topic-Aware Vector Reputation Propagation for Multi-Agent Ecosystems0
Adaptive Layerwise Perturbation: Unifying Off-Policy Corrections for LLM RL0
In-the-Wild Camouflage Attack on Vehicle Detectors through Controllable Image Editing0
GeoLAN: Geometric Learning of Latent Explanatory Directions in Large Language Models0
Deep Hilbert--Galerkin Methods for Infinite-Dimensional PDEs and Optimal Control0
Hyperagents4
Global Convergence of Multiplicative Updates for the Matrix Mechanism: A Collaborative Proof with Gemini 30
ProactiveBench: Benchmarking Proactiveness in Multimodal Large Language Models1
A Framework for Formalizing LLM Agent Security0
Reinforcement-guided generative protein language models enable de novo design of highly diverse AAV capsids0
Narrative Aligned Long Form Video Question Answering0
Show:102550
← PrevPage 6 of 13168Next →