SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 53765400 of 661570 papers

TitleStatusHype
DisCO: Reinforcing Large Reasoning Models with Discriminative Constrained OptimizationCode2
HISTAI: An Open-Source, Large-Scale Whole Slide Image Dataset for Computational PathologyCode2
AI-Driven Automation Can Become the Foundation of Next-Era Science of Science ResearchCode2
Safe Delta: Consistently Preserving Safety when Fine-Tuning LLMs on Diverse DatasetsCode2
DraftAttention: Fast Video Diffusion via Low-Resolution Attention GuidanceCode2
Demystifying and Enhancing the Efficiency of Large Language Model Based Search AgentsCode2
LifelongAgentBench: Evaluating LLM Agents as Lifelong LearnersCode2
Patho-R1: A Multimodal Reinforcement Learning-Based Pathology Expert ReasonerCode2
DiCo: Revitalizing ConvNets for Scalable and Efficient Diffusion ModelingCode2
Dynam3D: Dynamic Layered 3D Tokens Empower VLM for Vision-and-Language NavigationCode2
Mergenetic: a Simple Evolutionary Model Merging LibraryCode2
Search and Refine During Think: Autonomous Retrieval-Augmented Reasoning of LLMsCode2
Think Twice Before You Act: Enhancing Agent Behavioral Safety with Thought CorrectionCode2
Relational Graph TransformerCode2
ForensicHub: A Unified Benchmark & Codebase for All-Domain Fake Image Detection and LocalizationCode2
GuardReasoner-VL: Safeguarding VLMs via Reinforced ReasoningCode2
DexGarmentLab: Dexterous Garment Manipulation Environment with Generalizable PolicyCode2
SoftCoT++: Test-Time Scaling with Soft Chain-of-Thought ReasoningCode2
A Tutorial on Structural Identifiability of Epidemic Models Using StructuralIdentifiability.jlCode2
PnPXAI: A Universal XAI Framework Providing Automatic Explanations Across Diverse Modalities and ModelsCode2
Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning ModelsCode2
Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image SynthesisCode2
MMLongBench: Benchmarking Long-Context Vision-Language Models Effectively and ThoroughlyCode2
VRSplat: Fast and Robust Gaussian Splatting for Virtual RealityCode2
AdaptCLIP: Adapting CLIP for Universal Visual Anomaly DetectionCode2
Show:102550
← PrevPage 216 of 26463Next →