SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

474,278 papers248,326 code links4,818 tasks

Papers

Showing 72267250 of 474278 papers

TitleStatusHype
1-Lipschitz Network Initialization for Certifiably Robust Classification Applications: A Decay Problem0
Vision Transformers with Self-Distilled Registers0
LoopTool: Closing the Data-Training Loop for Robust LLM Tool Calls0
Multi-Agent Deep Research: Training Multi-Agent Systems with M-GRPO0
Error-Driven Scene Editing for 3D Grounding in Large Language Models0
MVI-Bench: A Comprehensive Benchmark for Evaluating Robustness to Misleading Visual Inputs in LVLMsCode0
GAIS: Frame-Level Gated Audio-Visual Integration with Semantic Variance-Scaled Perturbation for Text-Video Retrieval0
The Promise of RL for Autoregressive Image EditingCode0
LENS: Learning to Segment Anything with Unified Reinforced ReasoningCode0
Continual Learning for Image Captioning through Improved Image-Text AlignmentCode0
PIXEL: Adaptive Steering Via Position-wise Injection with eXact Estimated Levels under Subspace CalibrationCode0
CTRL-ALT-DECEIT: Sabotage Evaluations for Automated AI R&DCode0
DINO-Detect: A Simple yet Effective Framework for Blur-Robust AI-Generated Image DetectionCode0
AsyncVLA: Asynchronous Flow Matching for Vision-Language-Action ModelsCode0
iGaussian: Real-Time Camera Pose Estimation via Feed-Forward 3D Gaussian Splatting InversionCode0
Hierarchical Semantic Learning for Multi-Class Aorta SegmentationCode0
Beyond Benchmark: LLMs Evaluation with an Anthropomorphic and Value-oriented RoadmapCode0
MMaDA-Parallel: Multimodal Large Diffusion Language Models for Thinking-Aware Editing and GenerationCode0
Flood-LDM: Generalizable Latent Diffusion Models for rapid and accurate zero-shot High-Resolution Flood MappingCode0
CafeMed: Causal Attention Fusion Enhanced Medication RecommendationCode0
RTS-Mono: A Real-Time Self-Supervised Monocular Depth Estimation Method for Real-World DeploymentCode0
Few-Shot Precise Event Spotting via Unified Multi-Entity Graph and DistillationCode0
GigaEvo: An Open Source Optimization Framework Powered By LLMs And Evolution AlgorithmsCode0
Hierarchical Retrieval with Out-Of-Vocabulary Queries: A Case Study on SNOMED CTCode0
FuseSampleAgg: Fused Neighbor Sampling and Aggregation for Mini-batch GNNsCode0
Show:102550
← PrevPage 290 of 18972Next →