SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 53015325 of 661570 papers

TitleStatusHype
Think Before You Lie: How Reasoning Leads to Honesty0
When Tables Go Crazy: Evaluating Multimodal Models on French Financial Documents0
SToRM: Supervised Token Reduction for Multi-modal LLMs toward efficient end-to-end autonomous driving0
Probing and Bridging Geometry-Interaction Cues for Affordance Reasoning in Vision Foundation Models0
Buffer Matters: Unleashing the Power of Off-Policy Reinforcement Learning in Large Language Model Reasoning0
Under the Influence: Quantifying Persuasion and Vigilance in Large Language Models0
Toward Personalized LLM-Powered Agents: Foundations, Evaluation, and Future Directions0
TARAZ: Persian Short-Answer Question Benchmark for Cultural Evaluation of Language Models0
BLINK: Behavioral Latent Modeling of NK Cell Cytotoxicity0
Latent-Mark: An Audio Watermark Robust to Neural Resynthesis0
On the Statistical Optimality of Optimal Decision Trees0
Preserving Continuous Symmetry in Discrete Spaces: Geometric-Aware Quantization for SO(3)-Equivariant GNNs0
A Multilingual Human Annotated Corpus of Original and Easy-to-Read Texts to Support Access to Democratic Participatory Processes0
NerVE: Nonlinear Eigenspectrum Dynamics in LLM Feed-Forward Networks0
Distributional Regression with Tabular Foundation Models: Evaluating Probabilistic Predictions via Proper Scoring Rules0
Structural Causal Bottleneck Models0
ForgeDreamer: Industrial Text-to-3D Generation with Multi-Expert LoRA and Cross-View Hypergraph0
Open-World Motion Forecasting0
GLM-OCR Technical Report0
MiniAppBench: Evaluating the Shift from Text to Interactive HTML Responses in LLM-Powered Assistants1
TOSSS: a CVE-based Software Security Benchmark for Large Language Models0
Detecting Intrinsic and Instrumental Self-Preservation in Autonomous Agents: The Unified Continuation-Interest Protocol0
Shape-of-You: Fused Gromov-Wasserstein Optimal Transport for Semantic Correspondence in-the-Wild0
BackdoorIDS: Zero-shot Backdoor Detection for Pretrained Vision Encoder0
Truth as a Compression Artifact in Language Model Training0
Show:102550
← PrevPage 213 of 26463Next →