SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 58015825 of 661570 papers

TitleStatusHype
POLCA: Stochastic Generative Optimization with LLMCode0
SpiralDiff: Spiral Diffusion with LoRA for RGB-to-RAW Conversion Across CamerasCode0
Empowering Chemical Structures with Biological Insights for Scalable Phenotypic Virtual ScreeningCode0
PiGRAND: Physics-informed Graph Neural Diffusion for Intelligent Additive ManufacturingCode0
Invisible failures in human-AI interactionsCode0
ViFeEdit: A Video-Free Tuner of Your Video Diffusion TransformerCode0
SlovKE: A Large-Scale Dataset and LLM Evaluation for Slovak Keyphrase ExtractionCode0
InterveneBench: Benchmarking LLMs for Intervention Reasoning and Causal Study Design in Real Social SystemsCode0
CardioComposer: Leveraging Differentiable Geometry for Compositional Control of Anatomical Diffusion ModelsCode0
Beyond the Embedding Bottleneck: Adaptive Retrieval-Augmented 3D CT Report GenerationCode0
W2T: LoRA Weights Already Know What They Can DoCode0
Vietnamese Automatic Speech Recognition: A RevisitCode0
Impatient Users Confuse AI Agents: High-fidelity Simulations of Human Traits for Testing AgentsCode0
SciPostLayoutTree: A Dataset for Structural Analysis of Scientific PostersCode0
Imagine-then-Plan: Agent Learning from Adaptive Lookahead with World ModelsCode0
Surprised by Attention: Predictable Query Dynamics for Time Series Anomaly DetectionCode0
M2IR: Proactive All-in-One Image Restoration via Mamba-style Modulation and Mixture-of-ExpertsCode0
TopoVST: Toward Topology-fidelitous Vessel Skeleton TrackingCode0
MER-Bench: A Comprehensive Benchmark for Multimodal Meme ReappraisalCode0
Mastering the Minority: An Uncertainty-guided Multi-Expert Framework for Challenging-tailed Sequence LearningCode0
VideoChat-A1: Thinking with Long Videos by Chain-of-Shot ReasoningCode0
Rationale-Enhanced Decoding for Multi-modal Chain-of-ThoughtCode0
FingerTip 20K: A Benchmark for Proactive and Personalized Mobile LLM AgentsCode0
AutoEP: LLMs-Driven Automation of Hyperparameter Evolution for Metaheuristic AlgorithmsCode0
Overthinking Reduction with Decoupled Rewards and Curriculum Data SchedulingCode0
Show:102550
← PrevPage 233 of 26463Next →