SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 61016125 of 661570 papers

TitleStatusHype
Zoom to Essence: Trainless GUI Grounding by Inferring upon Interface Elements0
Right for the Wrong Reasons: Epistemic Regret Minimization for Causal Rung Collapse in LLMs0
PaCo-RL: Advancing Reinforcement Learning for Consistent Image Generation with Pairwise Reward Modeling1
Nemotron-CrossThink: Scaling Self-Learning beyond Math Reasoning0
Autonomous Agents Coordinating Distributed Discovery Through Emergent Artifact Exchange2
Rigorous Asymptotics for First-Order Algorithms Through the Dynamical Cavity Method0
Emotional Cost Functions for AI Safety: Teaching Agents to Feel the Weight of Irreversible Consequences0
Stop Before You Fail: Operational Capability Boundaries for Mitigating Unproductive Reasoning in Large Reasoning Models0
Delightful Policy Gradient0
Precedence-Constrained Decision Trees and Coverings0
SPARQ: Spiking Early-Exit Neural Networks for Energy-Efficient Edge AI0
The Active Discoverer Framework: Towards Autonomous Physics Reasoning through Neuro-Symbolic LaTeX Synthesis0
LLM-Augmented Release Intelligence: Automated Change Summarization and Impact Analysis in Cloud-Native CI/CD Pipelines0
Fine-tuning MLLMs Without Forgetting Is Easier Than You Think0
D-MEM: Dopamine-Gated Agentic Memory via Reward Prediction Error Routing0
Automatic Inter-document Multi-hop Scientific QA Generation0
Why Inference in Large Models Becomes Decomposable After Training0
Learning Unmasking Policies for Diffusion Language Models0
MistExit: Learning to Exit for Early Mistake Detection in Procedural Videos0
Personalized Cell Segmentation: Benchmark and Framework for Reference-Guided Cell Type Segmentation0
Distributional Semantics Tracing: A Framework for Explaining Hallucinations in Large Language Models0
Central Dogma Transformer II: An AI Microscope for Understanding Cellular Regulatory Mechanisms0
ZOTTA: Test-Time Adaptation with Gradient-Free Zeroth-Order Optimization0
Bringing Model Editing to Generative Recommendation in Cold-Start Scenarios0
Multilingual TinyStories: A Synthetic Combinatorial Corpus of Indic Children's Stories for Training Small Language Models0
Show:102550
← PrevPage 245 of 26463Next →