SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

659,983 papers248,104 code links4,818 tasks

Papers

Showing 701725 of 659983 papers

TitleStatusHype
When Visuals Aren't the Problem: Evaluating Vision-Language Models on Misleading Data Visualizations0
SynLeaF: A Dual-Stage Multimodal Fusion Framework for Synthetic Lethality Prediction Across Pan- and Single-Cancer Contexts0
Causal Evidence that Language Models use Confidence to Drive Behavior0
Seeing is Improving: Visual Feedback for Iterative Text Layout Refinement0
SPA: A Simple but Tough-to-Beat Baseline for Knowledge Injection0
Evaluating the Reliability and Fidelity of Automated Judgment Systems of Large Language Models0
Gumbel Distillation for Parallel Text Generation0
Noise Titration: Exact Distributional Benchmarking for Probabilistic Time Series Forecasting0
Dyadic: A Scalable Platform for Human-Human and Human-AI Conversation Research0
SpatialReward: Verifiable Spatial Reward Modeling for Fine-Grained Spatial Consistency in Text-to-Image Generation0
TiCo: Time-Controllable Training for Spoken Dialogue Models0
The Dual Mechanisms of Spatial Reasoning in Vision-Language Models0
3D-Layout-R1: Structured Reasoning for Language-Instructed Spatial Editing0
WorldCache: Content-Aware Caching for Accelerated Video World Models0
Generating and Evaluating Sustainable Procurement Criteria for the Swiss Public Sector using In-Context Prompting with Large Language Models0
Generalized multi-object classification and tracking with sparse feature resonator networks0
Maximum Entropy Relaxation of Multi-Way Cardinality Constraints for Synthetic Population Generation0
A vision-language model and platform for temporally mapping surgery from video0
A Foundation Model for Instruction-Conditioned In-Context Time Series Tasks0
flexvec: SQL Vector Retrieval with Programmatic Embedding Modulation0
Precision-Varying Prediction (PVP): Robustifying ASR systems against adversarial attacks0
TrajLoom: Dense Future Trajectory Generation from Video0
Dress-ED: Instruction-Guided Editing for Virtual Try-On and Try-Off0
Understanding LLM Performance Degradation in Multi-Instance Processing: The Roles of Instance Count and Context Length0
Do Consumers Accept AIs as Moral Compliance Agents?0
Show:102550
← PrevPage 29 of 26400Next →