SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 55015525 of 661570 papers

TitleStatusHype
Exposing Hidden Biases in Text-to-Image Models via Automated Prompt Search0
Sharing State Between Prompts and Programs1
Token-Level LLM Collaboration via FusionRoute0
Time-Annealed Perturbation Sampling: Diverse Generation for Diffusion Language Models0
On Theoretically-Driven LLM Agents for Multi-Dimensional Discourse Analysis0
Descent-Guided Policy Gradient for Scalable Cooperative Multi-Agent Learning0
MAWARITH: A Dataset and Benchmark for Legal Inheritance Reasoning with LLMs0
Beyond Polarity: Multi-Dimensional LLM Sentiment Signals for WTI Crude Oil Futures Return Prediction0
Overcoming the Modality Gap in Context-Aided Forecasting0
BrainBench: Exposing the Commonsense Reasoning Gap in Large Language Models0
Transition Flow Matching0
Loosely-Structured Software: Engineering Context, Structure, and Evolution Entropy in Runtime-Rewired Multi-Agent Systems0
Tackling Over-smoothing on Hypergraphs: A Ricci Flow-guided Neural Diffusion Approach0
LLM-Driven Discovery of High-Entropy Catalysts via Retrieval-Augmented Generation0
Embedding-Aware Feature Discovery: Bridging Latent Representations and Interpretable Features in Event Sequences0
Meta-TTRL: A Metacognitive Framework for Self-Improving Test-Time Reinforcement Learning in Unified Multimodal Models0
S2Act: Simple Spiking Actor0
ClawWorm: Self-Propagating Attacks Across LLM Agent Ecosystems0
You've Got a Golden Ticket: Improving Generative Robot Policies With A Single Noise Vector0
Simulation Distillation: Pretraining World Models in Simulation for Rapid Real-World Adaptation0
CorrectionPlanner: Self-Correction Planner with Reinforcement Learning in Autonomous Driving0
Domain Adaptation Without the Compute Burden for Efficient Whole Slide Image Analysis0
Parallelised Differentiable Straightest Geodesics for 3D Meshes0
Evolving Contextual Safety in Multi-Modal Large Language Models via Inference-Time Self-Reflective Memory0
Mask Is What DLLM Needs: A Masked Data Training Paradigm for Diffusion LLMs0
Show:102550
← PrevPage 221 of 26463Next →