SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 78767900 of 474278 papers

TitleStatusHype
Fast PINN Eigensolvers via Biconvex ReformulationCode0
CodeClash: Benchmarking Goal-Oriented Software Engineering0
The Biased Oracle: Assessing LLMs' Understandability and Empathy in Medical DiagnosesCode0
Music Arena: Live Evaluation for Text-to-Music0
Continual Learning, Not Training: Online Adaptation For Agents0
GeoToken: Hierarchical Geolocalization of Images via Next Token PredictionCode0
A Survey of Reasoning and Agentic Systems in Time Series with Large Language ModelsCode0
Dropping the D: RGB-D SLAM Without the Depth Sensor0
HarnessLLM: Automatic Testing Harness Generation via Reinforcement LearningCode0
LiteTracker: Leveraging Temporal Causality for Accurate Low-latency Tissue TrackingCode0
Count-Based Approaches Remain Strong: A Benchmark Against Transformer and LLM Pipelines on Structured EHRCode0
MARS-SQL: A multi-agent reinforcement learning framework for Text-to-SQLCode0
Linear Differential Vision Transformer: Learning Visual Contrasts via Pairwise DifferentialsCode0
Harmony in Divergence: Towards Fast, Accurate, and Memory-efficient Zeroth-order LLM Fine-tuningCode0
MOSPA: Human Motion Generation Driven by Spatial AudioCode0
Beyond Autoregression: An Empirical Study of Diffusion Large Language Models for Code GenerationCode0
Multimodal Spatial Reasoning in the Large Model Era: A Survey and BenchmarksCode0
Layer-Wise Modality Decomposition for Interpretable Multimodal Sensor FusionCode0
Advancing Machine-Generated Text Detection from an Easy to Hard Supervision PerspectiveCode0
iFlyBot-VLA Technical Report0
OmniTrack++: Omnidirectional Multi-Object Tracking by Learning Large-FoV Trajectory FeedbackCode0
RoboOmni: Proactive Robot Manipulation in Omni-modal Context0
Applying Medical Imaging Tractography Techniques to Painterly Rendering of ImagesCode0
Word Salad Chopper: Reasoning Models Waste A Ton Of Decoding Budget On Useless Repetitions, Self-KnowinglyCode0
Exploring the Hidden Capacity of LLMs for One-Step Text Generation0
Show:102550
← PrevPage 316 of 18972Next →