SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 75517575 of 474278 papers

TitleStatusHype
PEGNet: A Physics-Embedded Graph Network for Long-Term Stable Multiphysics SimulationCode0
Stabilizing Direct Training of Spiking Neural Networks: Membrane Potential Initialization and Threshold-robust Surrogate GradientCode0
EMAformer: Enhancing Transformer through Embedding Armor for Time Series ForecastingCode0
An update to PYRO-NN: A Python Library for Differentiable CT OperatorsCode0
Think-at-Hard: Selective Latent Iterations to Improve Reasoning Language ModelsCode0
FS-DAG: Few Shot Domain Adapting Graph Networks for Visually Rich Document UnderstandingCode0
LBMamba: Locally Bi-directional MambaCode0
FB-RAG: Improving RAG with Forward and Backward Lookup0
OSWorld-MCP: Benchmarking MCP Tool Invocation In Computer-Use Agents0
ViPRA: Video Prediction for Robot Actions0
Auto-US: An Ultrasound Video Diagnosis Agent Using Video Classification Framework and LLMsCode0
S^2M-Former: Spiking Symmetric Mixing Branchformer for Brain Auditory Attention DetectionCode0
SPHERE: Semantic-PHysical Engaged REpresentation for 3D Semantic Scene CompletionCode0
VectorSynth: Fine-Grained Satellite Image Synthesis with Structured SemanticsCode0
Human Motion Synthesis in 3D Scenes via Unified Scene Semantic OccupancyCode0
Disentangled Representation Learning via Modular Compositional BiasCode0
CC30k: A Citation Contexts Dataset for Reproducibility-Oriented Sentiment AnalysisCode0
ReIDMamba: Learning Discriminative Features with Visual State Space Model for Person Re-IdentificationCode0
Multi-Modal Assistance for Unsupervised Domain Adaptation on Point Cloud 3D Object DetectionCode0
EMNLP: Educator-role Moral and Normative Large Language Models Profiling0
GroupRank: A Groupwise Reranking Paradigm Driven by Reinforcement Learning0
Real-time pothole detection with onboard sensors and camera on vehiclesCode0
EduGuardBench: A Holistic Benchmark for Evaluating the Pedagogical Fidelity and Adversarial Safety of LLMs as Simulated TeachersCode0
MathSE: Improving Multimodal Mathematical Reasoning via Self-Evolving Iterative Reflection and Reward-Guided Fine-Tuning0
Teaching Pretrained Language Models to Think Deeper with Retrofitted Recurrence0
Show:102550
← PrevPage 303 of 18972Next →