SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 85018525 of 474278 papers

TitleStatusHype
EduAdapt: A Question Answer Benchmark Dataset for Evaluating Grade-Level Adaptability in LLMsCode0
RINS-T: Robust Implicit Neural Solvers for Time Series Linear Inverse ProblemsCode0
Agentic Reinforcement Learning for Search is Unsafe0
ConsistEdit: Highly Consistent and Precise Training-free Visual Editing0
HOIDiNi: Human-Object Interaction through Diffusion Noise Optimization0
C-SEO Bench: Does Conversational SEO Work?Code0
Robobench: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models as Embodied Brain0
EvoSyn: Generalizable Evolutionary Data Synthesis for Verifiable Learning0
Planned Diffusion0
Accelerating Vision Transformers with Adaptive Patch Sizes0
World-in-World: World Models in a Closed-Loop World0
DisasterM3: A Remote Sensing Vision-Language Dataset for Disaster Damage Assessment and ResponseCode0
KG-TRACES: Enhancing Large Language Models with Knowledge Graph-constrained Trajectory Reasoning and Attribution SupervisionCode0
Watch the Weights: Unsupervised monitoring and control of fine-tuned LLMsCode0
The 1st Solution for 7th LSVOS RVOS Track: SaSaSa2VACode0
Knowledge-based Visual Question Answer with Multimodal Processing, Retrieval and FilteringCode0
Robustness in Text-Attributed Graph Learning: Insights, Trade-offs, and New DefensesCode0
When One Moment Isn't Enough: Multi-Moment Retrieval with Cross-Moment InteractionsCode0
TaxoAlign: Scholarly Taxonomy Generation Using Language ModelsCode0
Adaptive Discretization for Consistency ModelsCode0
Exploring Structural Degradation in Dense Representations for Self-supervised LearningCode0
A Single Set of Adversarial Clothes Breaks Multiple Defense Methods in the Physical WorldCode0
Initialize to Generalize: A Stronger Initialization Pipeline for Sparse-View 3DGSCode0
GAS: Improving Discretization of Diffusion ODEs via Generalized Adversarial SolverCode0
Contextual Attention Modulation: Towards Efficient Multi-Task Adaptation in Large Language ModelsCode0
Show:102550
← PrevPage 341 of 18972Next →