SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

659,983 papers248,104 code links4,818 tasks

Papers

Showing 19011950 of 659983 papers

TitleStatusHype
Efficient Reasoning with Balanced Thinking2
Weights to Code: Extracting Interpretable Algorithms from the Discrete Transformer0
DeeperBrain: A Neuro-Grounded EEG Foundation Model Towards Universal BCI0
Studying the Role of Synthetic Data for Machine Learning-based Wireless Networks Traffic Forecasting0
Forest-Chat: Adapting Vision-Language Agents for Interactive Forest Change Analysis0
Mixed-Precision Training and Compilation for RRAM-based Computing-in-Memory Accelerators0
The Flexibility Trap: Why Arbitrary Order Limits Reasoning Potential in Diffusion Language Models2
Koopman Autoencoders with Continuous-Time Latent Dynamics for Fluid Dynamics Forecasting0
STELLAR: Structure-guided LLM Assertion Retrieval and Generation for Formal Verification0
1S-DAug: One-Shot Data Augmentation for Robust Few-Shot Generalization0
TS-Haystack: A Multi-Scale Retrieval Benchmark for Time Series Language Models0
From Logs to Language: Learning Optimal Verbalization for LLM-Based Recommendation at Industry Scale0
Benchmarking State Space Models, Transformers, and Recurrent Networks for US Grid Forecasting0
What You Read is What You Classify: Highlighting Attributions to Text and Text-Like Inputs0
Transformers Remember First, Forget Last: Dual-Process Interference in LLMs0
AutoResearch-RL: Perpetual Self-Evaluating Reinforcement Learning Agents for Autonomous Neural Architecture Discovery0
A Unified View of Drifting and Score-Based Models0
Interleaving Scheduling and Motion Planning with Incremental Learning of Symbolic Space-Time Motion Abstractions0
WebWeaver: Breaking Topology Confidentiality in LLM Multi-Agent Systems with Stealthy Context-Based Inference0
Representation Finetuning for Continual Learning0
A Simple Efficiency Incremental Learning Framework via Vision-Language Model with Nonlinear Multi-Adapters0
Reversible Lifelong Model Editing via Semantic Routing-Based LoRA0
A technology-oriented mapping of the language and translation industry: Analysing stakeholder values and their potential implication for translation pedagogy0
COTONET: A custom cotton detection algorithm based on YOLO11 for stage of growth cotton boll detection0
PREBA: Surgical Duration Prediction via PCA-Weighted Retrieval-Augmented LLMs and Bayesian Averaging Aggregation0
VTC-Bench: Evaluating Agentic Multimodal Models via Compositional Visual Tool Chaining1
Bridging the Simulation-to-Reality Gap in Electron Microscope Calibration via VAE-EM Estimation0
Nonstandard Errors in AI Agents0
HopChain: Multi-Hop Data Synthesis for Generalizable Vision-Language Reasoning0
Harm or Humor: A Multimodal, Multilingual Benchmark for Overt and Covert Harmful Humor0
The Convergence Frontier: Integrating Machine Learning and High Performance Quantum Computing for Next-Generation Drug Discovery0
TransText: Alpha-as-RGB Representation for Transparent Text Animation0
TDAD: Test-Driven Agentic Development - Reducing Code Regressions in AI Coding Agents via Graph-Based Impact AnalysisCode0
Pixel-Accurate Epipolar Guided Matching0
WASD: Locating Critical Neurons as Sufficient Conditions for Explaining and Controlling LLM Behavior0
SynQ: Accurate Zero-shot Quantization by Synthesis-aware Fine-tuning0
PowerFlow: Unlocking the Dual Nature of LLMs via Principled Distribution Matching0
From Weak Cues to Real Identities: Evaluating Inference-Driven De-Anonymization in LLM Agents0
Evolutionarily Stable Stackelberg Equilibrium0
Reflection in the Dark: Exposing and Escaping the Black Box in Reflective Prompt Optimization0
An SO(3)-equivariant reciprocal-space neural potential for long-range interactions0
AutoScreen-FW: An LLM-based Framework for Resume Screening0
Computational and Statistical Hardness of Calibration Distance0
FlowMS: Flow Matching for De Novo Structure Elucidation from Mass Spectra0
TARo: Token-level Adaptive Routing for LLM Test-time Alignment0
Statistical Testing Framework for Clustering Pipelines by Selective Inference0
The Spillover Effects of Peer AI Rinsing on Corporate Green Innovation0
AcceRL: A Distributed Asynchronous Reinforcement Learning and World Model Framework for Vision-Language-Action Models0
Mind the Rarities: Can Rare Skin Diseases Be Reliably Diagnosed via Diagnostic Reasoning?0
HOMEY: Heuristic Object Masking with Enhanced YOLO for Property Insurance Risk Detection0
Show:102550
← PrevPage 39 of 13200Next →