SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 1665116700 of 474278 papers

TitleStatusHype
POLARON: Precision-aware On-device Learning and Adaptive Runtime-cONfigurable AI acceleration0
Towards Biosignals-Free Autonomous Prosthetic Hand Control via Imitation Learning0
FreqPolicy: Efficient Flow-based Visuomotor Policy via Frequency Consistency0
Segment Concealed Objects with Incomplete Supervision0
WIP: Large Language Model-Enhanced Smart Tutor for Undergraduate Circuit Analysis0
Propositional Logic for Probing Generalization in Neural Networks0
Edit Flows: Flow Matching with Edit Operations0
mSTEB: Massively Multilingual Evaluation of LLMs on Speech and Text Tasks0
Wait, We Don't Need to "Wait"! Removing Thinking Tokens Improves Reasoning Efficiency0
CC-RAG: Structured Multi-Hop Reasoning via Theme-Based Causal Graphs0
Neighbors and relatives: How do speech embeddings reflect linguistic connections across the world?0
Hateful Person or Hateful Model? Investigating the Role of Personas in Hate Speech Detection by Large Language Models0
Advancing STT for Low-Resource Real-World Speech0
Learning to Reason Across Parallel Samples for LLM Reasoning0
H^2GFM: Towards unifying Homogeneity and Heterogeneity on Text-Attributed Graphs0
Private Evolution Converges0
A Simple Analysis of Discretization Error in Diffusion Models0
AlphaFold Database Debiasing for Robust Inverse Folding0
FUSE: Measure-Theoretic Compact Fuzzy Set Representation for Taxonomy Expansion0
Learning to Hear Broken Motors: Signature-Guided Data Augmentation for Induction-Motor Diagnostics0
Thermodynamically Consistent Latent Dynamics Identification for Parametric Systems0
Semi-gradient DICE for Offline Constrained Reinforcement Learning0
Filling in the Blanks: Applying Data Imputation in incomplete Water Metering Data0
Fusing Cross-modal and Uni-modal Representations: A Kronecker Product Approach0
When Simple Model Just Works: Is Network Traffic Classification in Crisis?0
Towards Fair Representation: Clustering and Consensus0
Understanding Task Vectors in In-Context Learning: Emergence, Functionality, and Limitations0
Model-Free Kernel Conformal Depth Measures Algorithm for Uncertainty Quantification in Regression Models in Separable Hilbert Spaces0
sparseGeoHOPCA: A Geometric Solution to Sparse Higher-Order PCA Without Covariance Estimation0
Flexible and Efficient Drift Detection without Labels0
Superposed Parameterised Quantum Circuits0
DIsoN: Decentralized Isolation Networks for Out-of-Distribution Detection in Medical Imaging0
Enhancing Motion Dynamics of Image-to-Video Models via Adaptive Low-Pass Guidance0
MARMOT: Masked Autoencoder for Modeling Transient Imaging0
Context-aware TFL: A Universal Context-aware Contrastive Learning Framework for Temporal Forgery Localization0
Robust Visual Localization via Semantic-Guided Multi-Scale Transformer0
Towards Cross-Subject EMG Pattern Recognition via Dual-Branch Adversarial Feature Disentanglement0
Local MDI+: Local Feature Importances for Tree-Based Models0
A PDE-Based Image Dehazing Method via Atmospheric Scattering Theory0
Enhancing Synthetic CT from CBCT via Multimodal Fusion: A Study on the Impact of CBCT Quality and Alignment0
Safe and Economical UAV Trajectory Planning in Low-Altitude Airspace: A Hybrid DRL-LLM Approach with Compliance Awareness0
Can A Gamer Train A Mathematical Reasoning Model?Code0
SPEED-RL: Faster Training of Reasoning Models via Online Curriculum LearningCode1
SWE-Flow: Synthesizing Software Engineering Data in a Test-Driven MannerCode1
DEAL: Disentangling Transformer Head Activations for LLM Steering0
ALE-Bench: A Benchmark for Long-Horizon Objective-Driven Algorithm EngineeringCode2
EDINET-Bench: Evaluating LLMs on Complex Financial Tasks using Japanese Financial StatementsCode1
CounselBench: A Large-Scale Expert Evaluation and Adversarial Benchmark of Large Language Models in Mental Health CounselingCode1
Consistent Paths Lead to Truth: Self-Rewarding Reinforcement Learning for LLM ReasoningCode1
syren-baryon: Analytic emulators for the impact of baryons on the matter power spectrumCode1
Show:102550
← PrevPage 334 of 9486Next →