SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 73017325 of 474278 papers

TitleStatusHype
Mixed-curvature decision trees and random forestsCode2
Interpreting and Editing Vision-Language Representations to Mitigate HallucinationsCode2
PnP-Flow: Plug-and-Play Image Restoration with Flow MatchingCode2
LLMs Know More Than They Show: On the Intrinsic Representation of LLM HallucinationsCode2
Curvature Diversity-Driven Deformation and Domain Alignment for Point CloudCode2
NNetscape Navigator: Complex Demonstrations for Web Agents Without a DemonstratorCode2
AutoML-Agent: A Multi-Agent LLM Framework for Full-Pipeline AutoMLCode2
CAnDOIT: Causal Discovery with Observational and Interventional Data from Time-SeriesCode2
Towards Comprehensive Detection of Chinese Harmful MemesCode2
A Comprehensive Survey of Mamba Architectures for Medical Image Analysis: Classification, Segmentation, Restoration and BeyondCode2
CodeJudge: Evaluating Code Generation with Large Language ModelsCode2
Interpretable Contrastive Monte Carlo Tree Search ReasoningCode2
A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegrained Image GenerationCode2
Open-RAG: Enhanced Retrieval-Augmented Reasoning with Open-Source Large Language ModelsCode2
VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit AssignmentCode2
Peeling Back the Layers: An In-Depth Evaluation of Encoder Architectures in Neural News RecommendersCode2
Leopard: A Vision Language Model For Text-Rich Multi-Image TasksCode2
Codev-Bench: How Do LLMs Understand Developer-Centric Code Completion?Code2
Fira: Can We Achieve Full-rank Training of LLMs Under Low-rank Constraint?Code2
Selective Aggregation for Low-Rank Adaptation in Federated LearningCode2
3DGS-DET: Empower 3D Gaussian Splatting with Boundary Guidance and Box-Focused Sampling for 3D Object DetectionCode2
MiraGe: Editable 2D Images using Gaussian SplattingCode2
FlipAttack: Jailbreak LLMs via FlippingCode2
From Code to Correctness: Closing the Last Mile of Code Generation with Hierarchical DebuggingCode2
Towards Generalizable Vision-Language Robotic Manipulation: A Benchmark and LLM-guided 3D PolicyCode2
Show:102550
← PrevPage 293 of 18972Next →