SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 2000120050 of 474278 papers

TitleStatusHype
Pangu Embedded: An Efficient Dual-system LLM Reasoner with Metacognition0
Fast Feature Matching of UAV Images via Matrix Band Reduction-based GPU Data Schedule0
PathFL: Multi-Alignment Federated Learning for Pathology Image SegmentationCode0
HydraNet: Momentum-Driven State Space Duality for Multi-Granularity Tennis Tournaments AnalysisCode0
When Models Reason in Your Language: Controlling Thinking Trace Language Comes at the Cost of AccuracyCode0
Maximizing Confidence Alone Improves Reasoning0
Can Large Language Models Match the Conclusions of Systematic Reviews?Code0
On the Dynamic Regret of Following the Regularized Leader: Optimism with History PruningCode0
GateNLP at SemEval-2025 Task 10: Hierarchical Three-Step Prompting for Multilingual Narrative ClassificationCode0
Test-time augmentation improves efficiency in conformal prediction0
Pre-Training Curriculum for Multi-Token Prediction in Language ModelsCode1
Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPOCode2
Training Language Models to Generate Quality Code with Program Analysis FeedbackCode1
RedTeamCUA: Realistic Adversarial Testing of Computer-Use Agents in Hybrid Web-OS EnvironmentsCode1
Test-Time Adaptation of Vision-Language Models for Open-Vocabulary Semantic SegmentationCode1
Preventing Spurious Interactions: A New Inductive Bias for Accurate Treatment Effect EstimationCode0
Nonstationary blind deconvolution using spectral constraintsCode0
SHTOcc: Effective 3D Occupancy Prediction with Sparse Head and Tail VoxelsCode0
Pre-training for Recommendation UnlearningCode0
HiDream-I1: A High-Efficient Image Generative Foundation Model with Sparse Diffusion TransformerCode7
Deep Learning-Based BMD Estimation from Radiographs with Conformal Uncertainty Quantification0
Bayesian Attention Mechanism: A Probabilistic Framework for Positional Encoding and Context Length ExtrapolationCode0
Re-ttention: Ultra Sparse Visual Generation via Attention Statistical ReshapeCode0
BOFormer: Learning to Solve Multi-Objective Bayesian Optimization via Non-Markovian RL0
Leveraging Diffusion Models for Synthetic Data Augmentation in Protein Subcellular Localization Classification0
BinauralFlow: A Causal and Streamable Approach for High-Quality Binaural Speech Synthesis with Flow Matching Models0
HiLDe: Intentional Code Generation via Human-in-the-Loop Decoding0
MAC-Gaze: Motion-Aware Continual Calibration for Mobile Gaze Tracking0
Principled Out-of-Distribution Generalization via Simplicity0
Enhancing Lifelong Multi-Agent Path-finding by Using Artificial Potential Fields0
Anomalies by Synthesis: Anomaly Detection using Generative Diffusion Models for Off-Road Navigation0
CrossNAS: A Cross-Layer Neural Architecture Search Framework for PIM Systems0
Causal-PIK: Causality-based Physical Reasoning with a Physics-Informed Kernel0
A comprehensive analysis of PINNs: Variants, Applications, and Challenges0
Permissioned LLMs: Enforcing Access Control in Large Language Models0
Operationalizing CaMeL: Strengthening LLM Defenses for Enterprise Deployment0
Security Benefits and Side Effects of Labeling AI-Generated Images0
Aurora: Are Android Malware Classifiers Reliable and Stable under Distribution Shift?0
Private Rate-Constrained Optimization with Applications to Fair Learning0
NegVQA: Can Vision Language Models Understand Negation?0
Can LLMs Deceive CLIP? Benchmarking Adversarial Compositionality of Pre-trained Multimodal Representation via Text Updates0
NGPU-LM: GPU-Accelerated N-Gram Language Model for Context-Biasing in Greedy ASR Decoding0
StressTest: Can YOUR Speech LM Handle the Stress?0
Forecasting Residential Heating and Electricity Demand with Scalable, High-Resolution, Open-Source Models0
Optimal Auction Design for Dynamic Stochastic Environments: Myerson Meets Naor0
Plug-and-Play Posterior Sampling for Blind Inverse Problems0
Kernel-Smoothed Scores for Denoising Diffusion: A Bias-Variance Study0
GLAMP: An Approximate Message Passing Framework for Transfer Learning with Applications to Lasso-based Estimators0
HelixDesign-Binder: A Scalable Production-Grade Platform for Binder Design Built on HelixFold30
Do Large Language Models Think Like the Brain? Sentence-Level Evidence from fMRI and Hierarchical Embeddings0
Show:102550
← PrevPage 401 of 9486Next →