SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 70767100 of 474278 papers

TitleStatusHype
ToC: Tree-of-Claims Search with Multi-Agent Language ModelsCode0
Diversity Has Always Been There in Your Visual Autoregressive ModelsCode0
Improving Multimodal Distillation for 3D Semantic Segmentation under Domain ShiftCode0
ARISE: Agentic Rubric-Guided Iterative Survey Engine for Automated Scholarly Paper GenerationCode0
ResearStudio: A Human-Intervenable Framework for Building Controllable Deep-Research AgentsCode0
T2I-RiskyPrompt: A Benchmark for Safety Evaluation, Attack, and Defense on Text-to-Image ModelCode0
Automated Interpretable 2D Video Extraction from 3D EchocardiographyCode0
Bridging Visual Affective Gap: Borrowing Textual Knowledge by Learning from Noisy Image-Text PairsCode0
Continual Alignment for SAM: Rethinking Foundation Models for Medical Image Segmentation in Continual LearningCode0
Dual-domain Adaptation Networks for Realistic Image Super-resolutionCode0
BiFingerPose: Bimodal Finger Pose Estimation for Touch DevicesCode0
MMT-ARD: Multimodal Multi-Teacher Adversarial Distillation for Robust Vision-Language ModelsCode0
Lost in Translation and Noise: A Deep Dive into the Failure Modes of VLMs on Real-World TablesCode0
Goal-Directed Search Outperforms Goal-Agnostic Memory Compression in Long-Context Memory TasksCode0
TurkColBERT: A Benchmark of Dense and Late-Interaction Models for Turkish Information Retrieval0
STAMP: Spatial-Temporal Adapter with Multi-Head Pooling0
SAM2S: Segment Anything in Surgical Videos via Semantic Long-term Tracking0
SAM 3: Segment Anything with Concepts0
Fantastic Bugs and Where to Find Them in AI Benchmarks0
V-ReasonBench: Toward Unified Reasoning Benchmark Suite for Video Generation Models0
Probing the Critical Point (CritPt) of AI Reasoning: a Frontier Physics Research Benchmark0
Beyond Human Judgment: A Bayesian Evaluation of LLMs' Moral Values Understanding0
SAM 3D: 3Dfy Anything in Images0
AutoBackdoor: Automating Backdoor Attacks via LLM AgentsCode0
gfnx: Fast and Scalable Library for Generative Flow Networks in JAXCode0
Show:102550
← PrevPage 284 of 18972Next →