SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 34513475 of 661570 papers

TitleStatusHype
Mixed-Precision Training and Compilation for RRAM-based Computing-in-Memory Accelerators0
The Flexibility Trap: Why Arbitrary Order Limits Reasoning Potential in Diffusion Language Models2
Koopman Autoencoders with Continuous-Time Latent Dynamics for Fluid Dynamics Forecasting0
STELLAR: Structure-guided LLM Assertion Retrieval and Generation for Formal Verification0
1S-DAug: One-Shot Data Augmentation for Robust Few-Shot Generalization0
TS-Haystack: A Multi-Scale Retrieval Benchmark for Time Series Language Models0
From Logs to Language: Learning Optimal Verbalization for LLM-Based Recommendation at Industry Scale0
Benchmarking State Space Models, Transformers, and Recurrent Networks for US Grid Forecasting0
What You Read is What You Classify: Highlighting Attributions to Text and Text-Like Inputs0
Transformers Remember First, Forget Last: Dual-Process Interference in LLMs0
AutoResearch-RL: Perpetual Self-Evaluating Reinforcement Learning Agents for Autonomous Neural Architecture Discovery0
A Unified View of Drifting and Score-Based Models0
Interleaving Scheduling and Motion Planning with Incremental Learning of Symbolic Space-Time Motion Abstractions0
WebWeaver: Breaking Topology Confidentiality in LLM Multi-Agent Systems with Stealthy Context-Based Inference0
Representation Finetuning for Continual Learning0
A Simple Efficiency Incremental Learning Framework via Vision-Language Model with Nonlinear Multi-Adapters0
Reversible Lifelong Model Editing via Semantic Routing-Based LoRA0
A technology-oriented mapping of the language and translation industry: Analysing stakeholder values and their potential implication for translation pedagogy0
COTONET: A custom cotton detection algorithm based on YOLO11 for stage of growth cotton boll detection0
PREBA: Surgical Duration Prediction via PCA-Weighted Retrieval-Augmented LLMs and Bayesian Averaging Aggregation0
VTC-Bench: Evaluating Agentic Multimodal Models via Compositional Visual Tool Chaining1
Bridging the Simulation-to-Reality Gap in Electron Microscope Calibration via VAE-EM Estimation0
Nonstandard Errors in AI Agents0
HopChain: Multi-Hop Data Synthesis for Generalizable Vision-Language Reasoning0
Harm or Humor: A Multimodal, Multilingual Benchmark for Overt and Covert Harmful Humor0
Show:102550
← PrevPage 139 of 26463Next →